I have Django models setup in the following manner:
model A has a one-to-many relationship to model B
each record in A has between 3,000 to 15,000 records in B
What is the best way to construct a query that will retrieve the newest (greatest pk) record in B that corresponds to a record in A for each record in A? Is this something that I must use SQL for in lieu of the Django ORM?
Create a helper function for safely extracting the 'top' item from any queryset. I use this all over the place in my own Django apps.
def top_or_none(queryset): """Safely pulls off the top element in a queryset""" # Extracts a single element collection w/ top item result = queryset[0:1] # Return that element or None if there weren't any matches return result if result else None
This uses a bit of a trick w/ the slice operator to add a limit clause onto your SQL.
Now use this function anywhere you need to get the 'top' item of a query set. In this case, you want to get the top B item for a given A where the B's are sorted by descending pk, as such:
latest = top_or_none(B.objects.filter(a=my_a).order_by('-pk'))
There's also the recently added 'Max' function in Django Aggregation which could help you get the max pk, but I don't like that solution in this case since it adds complexity.
P.S. I don't really like relying on the 'pk' field for this type of query as some RDBMSs don't guarantee that sequential pks is the same as logical creation order. If I have a table that I know I will need to query in this fashion, I usually have my own 'creation' datetime column that I can use to order by instead of pk.
<strong>Edit based on comment:</strong>
If you'd rather use queryset, you can modify the 'top_or_none' function thusly:
def top_or_none(queryset): """Safely pulls off the top element in a queryset""" try: return queryset except IndexError: return None
I didn't propose this initially because I was under the impression that queryset would pull back the entire result set, then take the 0th item. Apparently Django adds a 'LIMIT 1' in this scenario too, so it's a safe alternative to my slicing version.
Of course you can also take advantage of Django's related manager construct here and build the queryset through your 'A' object, depending on your preference:
latest = top_or_none(my_a.b_set.order_by('-pk'))
I don't think Django ORM can do this (but I've been pleasantly surprised before...). If there's a reasonable number of A record (or if you're paging), I'd just add a method to A model that would return this 'newest' B record. If you want to get a lot of A records, each with it's own newest B, I'd drop to SQL.
remeber that no matter which route you take, you'll need a suitable <strong>composite</strong> index on B table, maybe adding an
order_by=('a_fk','-id') to the