Blog

Ideas and insights from our team

Advanced Django querying: sorting events by date


Imagine the situation where our application has events (scheduled tasks, appointments, python conferences across the world) happening in different moments of time. Almost anything with a date attached to it. We want to display them in a simple list to the user. Given we are in February 2017 (the date this post was written), what would be the best way to order these events from the user's point of view?

To make this easier to answer, we can change the question a little bit: what entry in the list would be the most valuable to the user? I suppose a good answer would be DjangoCon Europe, which is the closest upcoming event. Cool, so let's display it first. What is the second most valuable entry? Well, considering that PyCamp Argentina and PyCon Brasil are long passed, there is not much the user can do about them. So PyCon US would be a good choice. Now we are left only with the two passed entries. Which should come next? My personal opinion is that the most recent events should come first. The older an entry is, the less I care about it. So, here is the final order we came up with:

Let's take a close look at what we are doing here. First, we have upcoming events ordered chronologically [a.k.a. in the order they happen] and then we have passed events ordered in the reverse chronological order. Weird.

Alright, now how do we query these events in our database to show them in our Django view?

The naive approach

A simple solution would be to make two separate queries and concatenate the results. This is how it would look like:

from django.utils import timezone
from myapp.models import Event

class EventListView(generics.ListView):
    def get_queryset(self):
        now = timezone.now()
        upcoming = Event.objects.filter(date__gte=now).order_by('date')
        passed = Event.objects.filter(date__lt=now).order_by('-date')
        return list(upcoming) + list(passed)

The main problem here is that by making these two queries we will be fetching ALL events from the database. This will be a problem once we have many entries. The usual solution to avoid retrieving all the data from the database is to do pagination. In this case, pagination won't help. Both queries will always need to run before so we have a single list and can then slice the page from it. Not cool.

The sagacious way

Alright, so what we want is a way to return the events in the correct order in a single query. To achieve this we will need to use some more advanced tools from the Django ORM. We'll break the solution into two parts. In the first part, we are going to separate upcoming events from previous ones and make upcoming ones show up first.

We are going to use Django's Case, When queryset operators. I'm not going to explain them in detail in this blog post. If you don't know how they work I recommend reading this other blog post. The other feature we are going to use is annotations. You can read about it in the Django documentation

Here is the first part of our query:

now = timezone.now()
(Event.objects.annotate(
    relevance=models.Case(
        models.When(date__gte=now, then=1),
        models.When(date__lt=now, then=2),
        output_field=models.IntegerField(),
    )).order_by('relevance'))

We are annotating upcoming events with relevance = 1 and passed ones with relevance = 2. When we order the query by relevance upcoming events will be placed in front of passed ones. This is what we achieved so far:

Pretty neat huh?! But we're still not done. Upcoming events are showing in the correct order, but passed ones are reversed. The second part is a little more tricky because upcoming events go in ascendant date order and previous in the descendant date order. The solution is to annotate the time diff from current date to the event date.

now = timezone.now()
(Event.objects.annotate(
    relevance=models.Case(
        models.When(date__gte=now, then=1),
        models.When(date__lt=now, then=2),
        output_field=models.IntegerField(),
    )).annotate(
    timediff=models.Case(
        models.When(date__gte=now, then=F('date') - now),
        models.When(date__lt=now, then=now - F('date')),
        output_field=models.DurationField(),
    )).order_by('relevance', 'timediff'))

Notice that upcoming events are annotated with event_date - current_date and passed ones with current_date - event_date. When we use timediff as the second parameter in the order_by it will make the former to be returned in the chronological order and the later in the reverse chronological order.

Hooray! Mission accomplished, we can now get events in the correct order from a single database query. It can be paginated and will scale nicely.

BTW: Vinta will be present [and giving talks] at both DjangoCon Europe and PyCon US this year.
If you are attending DjangoCon Europe, you should come by to watch Lais talking about our Django/React boilerplate project (Defining a customizable boilerplate using Django, React and Bootstrap) and Flávio talking about Qualities of great reusable Django apps.
At Pycon US look for Flávio in his How to make a good library API talk!

About Filipe Ximenes

Bike enthusiast, software developer and former director at Python Brasil Association. Likes open source and how people interact in open source communities.

Comments