How Order of Operations Affects Query Results in PostgreSQL

postgresql

I've written a query, and I'm kinda sure that it does what it want. I'm thinking, "huh, I wonder if I know how Postgres works?". I'm going to list a specific assumption about this query below, and if that is wrong, please correct my understanding.

Assumption: The OVER (PARTITION BY) will execute against the data after the final GROUP BY and ORDER BY are executed, thereby limiting the records that are summed to those that are indeed DISTINCT ON (a.id, scl.new_status_id).

Note: This is a Python string that will be formatted into a proper SQL query using a Python library, hence the %s format characters. If that causes confusion for you, I will update it. Please assume that I am passing in arguments that make a valid SQL query.

SELECT DISTINCT ON (a.id, scl.new_status_id)
    scl.new_status_id
    ,(scl.change_date AT TIME ZONE %s)::date as d
    ,SUM(COUNT(*)) OVER (PARTITION BY scl.new_status_id
                         order by (scl.change_date AT TIME ZONE %s)::date)

FROM automated_responses_statuschangelogs scl
JOIN application_app a ON scl.applicant_id = a.id
WHERE a.applied_class in %s
AND a.applied_track in %s
AND a.applied_date <= %s
AND a.applied_date > %s
AND scl.new_status_id IN %s
AND scl.new_status_id NOT IN %s

GROUP BY a.id, scl.new_status_id, d
ORDER BY a.id, scl.new_status_id, d desc

Best Answer

Your assumptions are almost correct.
The order of events:

  1. Build table from FROM list
    • Apply join conditions and filters in WHERE clause.
  2. GROUP BY / aggregate functions
  3. Window functions
  4. ORDER BY
  5. DISTINCT ON (has to match leading columns of ORDER BY)
  6. LIMIT (not in your case).

Window functions operate logically independent of the ORDER BY clause. But there are synergies for performance if both use the same sort order.

There is not enough information to tell whether this query does what it is supposed to do.

Related answers to shed some more light on the topic: