Blog

Ideas and insights from our team

Database concurrency in Django the right way


When developing applications that have specific needs for running asynchronous tasks outside the web application, it is common to adopt a task queue such as Celery. This allows, for example, for the server to handle a request, start an asynchronous task responsible of doing some heavyweight processing, and return an answer while the task is still running.

Building upon this example, the ideal is to separate the high time-demanding parts from the view processing flow, so we run those parts in a separate task. Now, let's suppose we have to do some database operations both in the view and in the separate task. If not done carefully, those operations can be a source for issues that can be hard to track.

ATOMIC_REQUESTS database config parameter

It's a common practice to set the ATOMIC_REQUESTS parameter to True in the database configuration. This configuration enables for the Django views to run inside of a transaction, so that if an exception is produced during the request handling, Django can simply roll back the transaction. This also ensures the database will never be left in an inconsistent state, and since a transaction is an atomic operation in the database, no other application trying to access the database while a transaction is running will be able to see inconsistent data coming from an incomplete transaction.

Database race condition with tasks and Django request handlers

Data races happen when two or more concurrent threads try to access the same memory address (or in this case, some specific data in a database) at the same time. This can lead to non-deterministic results if, say, one thread is trying to read data while the other is writing it, or if both threads are writing at the same time. Of course, if both threads are just reading data, no problem will occur.

Now, regarding our problem, let's start by writing a simple view, which will write some data to our database:

from django.views.generic import View
from django.http import HttpResponse
from .models import Data


class SimpleHandler(View):

    def get(self, request, *args, **kwargs):
        my_data = Data.objects.create(name='Dummy')
        return HttpResponse('pk: {pk}'.format(pk=my_data.pk))

And here's our model:

from django.db import models


class Data(models.Model):
    name = models.CharField(max_length=50)

This models a very simple request handling. If we make a request from our browser, we will get as an answer the primary key from the inserted data, such as pk: 41. Now, we modify our get method to launch a Celery task that will fetch data from our database:

    def get(self, request, *args, **kwargs):
        my_data = Data.objects.create(name='Dummy')
        do_stuff.delay(my_data.pk)
        return HttpResponse('pk: {pk}'.format(pk=my_data.pk))

And in our tasks file:

from celery_test.celery import app
from .models import Data


@app.task
def do_stuff(data_pk):
    my_data = Data.objects.get(pk=data_pk)
    my_data.name = 'new name'
    my_data.save()

(Don't forget to import do_stuff in your views file!)

Now we have a subtle race condition: it is likely that Celery will eventually raise an exception when fetching the data, such as Task request_handler.tasks.do_stuff[2a3aecd0-0720-4360-83b5-3558ae1472f2] raised unexpected: DoesNotExist('Data matching query does not exist.',). It might seem that this should not happen, since we are inserting a row, getting its primary key, and passing it for the task. Thus, the data matching the query should exist. But, as said earlier, if ATOMIC_REQUESTS is set to True, the view will run in a transaction. The data will only be externally accessible when the view finishes its execution, and the transaction is committed. This usually will happen after Celery executes the task.

Solution approaches

There are many solutions for this problem. The first and more obvious one is to set ATOMIC_REQUESTS to False, but we want to avoid this since this will affect every other view in our project, and using transactions in requests have many advantages as stated earlier. Another solution is to use the non_atomic_requests decorator, as this would only affect one view. Still, this can be unwanted, since we can be compromising this one view's functionality. There are also libraries that were used to run code when the current transaction is committed, such as django_atomic_celery and django-transaction-signals, but those are now legacy and should not be used. An explanation can be read on the django-transaction-signals project.

The current and most used alternative is to use hooks. For Django >= 1.9, you can use the on_commit hook, and for Django < 1.9, use django-transaction-hooks. We'll use the on_commit approach here, but if you have to use django-transaction-hooks, the solution is very similar.

All you have to do is import transaction as in from django.db import transaction on your views file, and pass any function you want to execute after the commit to transaction.on_commit. This function should not have any arguments, but we can wrap our task call in a lambda, so our final view looks like this:

class SimpleHandler(View):

    def get(self, request, *args, **kwargs):
        my_data = Data.objects.create(name='Dummy')
        transaction.on_commit(lambda: do_stuff.delay(my_data.pk))
        return HttpResponse('pk: {pk}'.format(pk=my_data.pk))

The only caveat of the above solution is that you'll have to use TransactionTestCase to properly test views that use transaction.on_commit. on_commit is good enough for most applications. If you need more control, you can use the non_atomic_requests decorator. But remember that you will have to deal with rollbacks manually.

Want read more about Django? Check other posts from our blog:

About Guilherme Caminha

Software developer at Vinta Software.

Comments