Postgresql – Improve query performance of subquery with MAX()

greatest-n-per-groupindexjoin;postgresql

I am looking to migrate our support database from MS SQL Server to Postgres and most of it has gone OK. However, some of the queries seem to be slower on PG than on the SQL Server.

We have a ticket table which has a related table assignments. In the assignment table (linked by ticket_id) we can make comments and assign the ticket to another group to work on the ticket. So to find out who currently has the ticket we need to get the last update for the ticket

Select tbltickets.ticketid, tbltickets.ticketsummary, ta2.*
from tblTickets 
Join (
   SELECT tblassignment.ticketid,
          max(tblassignment.allocationid) AS maxallocationid
   FROM tblassignment 
   join tbltickets on tblassignment.ticketid = tbltickets.ticketid
   GROUP BY tblassignment.ticketid) as ta on tbltickets.ticketid = ta.ticketid
join tblassignment ta2 on ta.maxallocationid = ta2.allocationid
where tbltickets.closed is NULL;

So it works but it slower than we would like circa 2seconds and will only get slower.

it is looking up the maxallocationid subquery that is taking the time at 1.5s as it is doing the calculation for all the assignments, not just the ones for the tickets we are interested in (e.g closed is NULL)

I can improve like this:

Select tbltickets.ticketid, tbltickets.ticketsummary, ta2.*
from tblTickets 
Join (
   SELECT tblassignment.ticketid,
          max(tblassignment.allocationid) AS maxallocationid
   FROM tblassignment 
   join tbltickets on tblassignment.ticketid = tbltickets.ticketid
   where tbltickets.closed is null
   GROUP BY tblassignment.ticketid) as ta on tbltickets.ticketid = ta.ticketid
join tblassignment ta2 on ta.maxallocationid = ta2.allocationid
where tbltickets.closed is NULL;

It now runs in 170 msec which is fine, but I cannot then use in a view to make it reusable with different criteria (or at least I don't know how).

Is there a way to allow for filtering the tickets, then look up maxassignment only on returned tickets, preferable without reorting to functions/stored procedures!

I have considered using a trigger to update the ticket table with info from the assignment, but do not want to go there if I can avoid it as I am not familiar with this!

Best Answer

Looks like you want a (LEFT) JOIN LATERAL - the standard SQL equivalent of (OUTER) APPLY in SQL Server, as a_horse called to mind:

SELECT t.ticketid, t.ticketsummary, ta2.*
FROM   tblTickets t
LEFT   JOIN LATERAL (
   SELECT ta.allocationid
   FROM   tblassignment ta
   WHERE  ta.ticketid = t.ticketid
   ORDER  BY ta.allocationid DESC NULLS LAST  -- see below!
   LIMIT  1
   ) ta2 ON true
WHERE  t.closed is NULL;

The LEFT JOIN preserves rows from tblTickets without related rows in tblassignment. To eliminate those, use JOIN instead.

Be sure to have an index on tblassignment (ticketid, allocationid DESC NULLS LAST).
Drop NULLS LAST in query and index if allocationid is defined NOT NULL. (Index and query must agree on this for the index to be applicable.)

You can add any predicates to filter rows from tblTickets before the greatest allocationid is looked up, only for qualifying rows. (Maybe add index-support for common predicates on tblTickets ...)

This query should be very fast.

Related: