Postgresql – Improve query performance of subquery with MAX()

greatest-n-per-groupindexjoin;postgresql

I am looking to migrate our support database from MS SQL Server to Postgres and most of it has gone OK. However, some of the queries seem to be slower on PG than on the SQL Server.

We have a ticket table which has a related table assignments. In the assignment table (linked by ticket_id) we can make comments and assign the ticket to another group to work on the ticket. So to find out who currently has the ticket we need to get the last update for the ticket

Select tbltickets.ticketid, tbltickets.ticketsummary, ta2.*
from tblTickets 
Join (
   SELECT tblassignment.ticketid,
          max(tblassignment.allocationid) AS maxallocationid
   FROM tblassignment 
   join tbltickets on tblassignment.ticketid = tbltickets.ticketid
   GROUP BY tblassignment.ticketid) as ta on tbltickets.ticketid = ta.ticketid
join tblassignment ta2 on ta.maxallocationid = ta2.allocationid
where tbltickets.closed is NULL;

So it works but it slower than we would like circa 2seconds and will only get slower.

it is looking up the maxallocationid subquery that is taking the time at 1.5s as it is doing the calculation for all the assignments, not just the ones for the tickets we are interested in (e.g closed is NULL)

I can improve like this:

Select tbltickets.ticketid, tbltickets.ticketsummary, ta2.*
from tblTickets 
Join (
   SELECT tblassignment.ticketid,
          max(tblassignment.allocationid) AS maxallocationid
   FROM tblassignment 
   join tbltickets on tblassignment.ticketid = tbltickets.ticketid
   where tbltickets.closed is null
   GROUP BY tblassignment.ticketid) as ta on tbltickets.ticketid = ta.ticketid
join tblassignment ta2 on ta.maxallocationid = ta2.allocationid
where tbltickets.closed is NULL;

It now runs in 170 msec which is fine, but I cannot then use in a view to make it reusable with different criteria (or at least I don't know how).

Is there a way to allow for filtering the tickets, then look up maxassignment only on returned tickets, preferable without reorting to functions/stored procedures!

I have considered using a trigger to update the ticket table with info from the assignment, but do not want to go there if I can avoid it as I am not familiar with this!

Best Answer

Looks like you want a (LEFT) JOIN LATERAL - the standard SQL equivalent of (OUTER) APPLY in SQL Server, as a_horse called to mind:

SELECT t.ticketid, t.ticketsummary, ta2.*
FROM   tblTickets t
LEFT   JOIN LATERAL (
   SELECT ta.allocationid
   FROM   tblassignment ta
   WHERE  ta.ticketid = t.ticketid
   ORDER  BY ta.allocationid DESC NULLS LAST  -- see below!
   LIMIT  1
   ) ta2 ON true
WHERE  t.closed is NULL;

The LEFT JOIN preserves rows from tblTickets without related rows in tblassignment. To eliminate those, use JOIN instead.

Be sure to have an index on tblassignment (ticketid, allocationid DESC NULLS LAST).
Drop NULLS LAST in query and index if allocationid is defined NOT NULL. (Index and query must agree on this for the index to be applicable.)

You can add any predicates to filter rows from tblTickets before the greatest allocationid is looked up, only for qualifying rows. (Maybe add index-support for common predicates on tblTickets ...)

This query should be very fast.

Sql-server – How should row-specific metadata be created handled for an outer join view

If you're just trying to do a batched update to attach a bunch of tickets to an invoice (i.e. Invoices and tickets live in a 1:M relationship) then you have to get a list of ticket IDs into the stored procedure in the first place.

You could use a table variable as an input parameter if you have a version of SQL Server that supports these. Otherwise, you would have to encode the list of ticket IDs in some way, for example a list serialised as a string. You could encode the list as XML if you don't mind eternal damnation :)

If you're trying to add an Invoice record to a batch of tickets (i.e tickets live in a 1:0-1 relationship with invoices) then you have the same options for input.

In both cases the most efficient way to do the updates is likely to be creating a table variable with your list of invoice changes and then doing an update/insert operation joining the table variable against the main tables.

OTOH I can't think of anything more elegant than that.

If your problem is passing the list in across from the client via then you can pass in table-valued parameters in recent versions of ADO.NET. In older versions you are basically in the job of scrubbing and escaping the identifiers client side and passing the list across serialised as a string or XML (supported from .NET 2.0 and SQL Server 2005 onwards).

If you're using anything else (e.g. PHP) then you're probably going to have to drop down to the lowest common denominator - scrub the list of tickets and encode them as a string. Maybe the stored procedure can do some validation, such as checking the tickets all belong to the correct user.

Mysql – Limit WHERE to MAX() & COUNT()

Here is your original query from the question

SELECT e.*, MAX(m.datetime) AS unread_last, COUNT(m.id) AS unread 
FROM TAB_EVENT e 
LEFT JOIN TAB_MESSAGE m ON e.id=m.event_id 
WHERE ( m.`read` IS NULL OR m.`read` = 0) 
GROUP BY e.id 
ORDER BY m.datetime DESC, e.id ASC 
LIMIT 10;

Maybe try refactoring the query in such a way that in executes in this sequence

only collect necessary columns from TAB_MESSAGE
apply LIMIT 10 against the collected rows from TAB_MESSAGE
run the JOIN
apply the MAX() and COUNT() last

Here is what I am proposing

SELECT e.*, MAX(m.datetime) AS unread_last, COUNT(m.id) AS unread 
FROM
(
    SELECT * FROM
    (SELECT id,event_id,datetime FROM TAB_MESSAGE
    WHERE read IS NULL OR read = 0
    ORDER BY datetime DESC) mm
    LIMIT 10
) m
LEFT JOIN TAB_EVENT e 
ON e.id=m.event_id
ORDER BY m.datetime DESC, e.id ASC;

Give it a Try !!!

UPDATE 2012-02-21 17:06 EDT

SELECT e.*, MAX(m.datetime) AS unread_last, COUNT(m.id) AS unread 
FROM
TAB_EVENT e LEFT JOIN
(
    SELECT * FROM
    (SELECT id,event_id,datetime FROM TAB_MESSAGE
    WHERE read IS NULL OR read = 0
    ORDER BY datetime DESC) mm
    LIMIT 10
) m
ON e.id=m.event_id
ORDER BY m.datetime DESC, e.id ASC;

@Sebastian, I put the query back in the original join order. Please try this as well !!!

UPDATE 2012-02-21 17:11 EDT

Make sure the datetime field is indexed

ALTER TABLE TAB_MESSAGE ADD INDEX read_datetime_ndx (read,datetime);

Best Answer

Related Solutions

Sql-server – How should row-specific metadata be created handled for an outer join view

Mysql – Limit WHERE to MAX() & COUNT()

UPDATE 2012-02-21 17:06 EDT

UPDATE 2012-02-21 17:11 EDT

Related Question