Postgresql: get last/newest value for each group of rows

postgresql

I have the following notifications table in postgresql:

- notification_id, bigint
- group_key, character varying(24)
- associated, jsonb object
- timestamp

The table now looks like this

+------------------------------------------------------+
| notification_id | group_key | associated | timestamp |
+------------------------------------------------------+
| 1               | key1      | {123}      | ...       |
| 2               | key1      | {456}      | ...       |
| 3               | key2      | {789}      | ...       |
+------------------------------------------------------+

The following query gives the following results:

SELECT
    a.group_key,
    MAX(a.notification_id) as max,
    COUNT(a.notification_id) as total
FROM
    data.notifications a
GROUP BY
    a.group_key
ORDER BY
    MAX(a.notification_id) DESC

+-------------------------+
| group_key | max | total |
+-------------------------+
| key1      | 2   | 2     |
| key2      | 3   | 1     |
+-------------------------+

Explanation:
the notifications with ID 1 and 2 belong to the same notification group (key1). I want to fetch the max notification ID for the group (in this case 2), the number of notifications in the group (in this case also 2).

What I want now is also the content of the column 'associated' of the last notification, so in this case from notification with ID 2 (so in this case '{456}').

This column contains a jsonb object with various keys. I want to do joins from other tables with those keys, but I don't know how to get the last JSONB object for each notification group.

I tried using LAST_VALUES but I constantly get errors in pgadmin, saying that the column "associated" should be in the group by clause, but when I do this, I don't get the correct resultset returned.

SELECT
    a.group_key,
    MAX(a.notification_id) as max,
    COUNT(a.notification_id) as total,
    LAST_VALUE(a.associated) OVER (PARTITION BY a.group_key ORDER BY MAX(a.notification_id) DESC) as associated
FROM
    data.notifications a
GROUP BY
    a.group_key
ORDER BY
    MAX(a.notification_id) DESC

Best Answer

This is a classic case where using the ROW_NUMBER() window function can be used to get the latest row within a grouping (PARTITION) like so:

WITH CTE_Notifications_Sorted AS
(
    SELECT notification_id, group_key, associated, timestamp, 
    COUNT(notification_id) OVER (PARTITION BY group_key) AS notification_id_count,
ROW_NUMBER() OVER (PARTITION BY group_key ORDER BY notification_id DESC) AS PartitionSortId -- Generates a unique ID for each row within the grouping (partition) of group_key ordered by the notification_id descending
    FROM data.notifications
)

SELECT group_key, notification_id AS notification_id_max, notification_id_count AS total, associated, timestamp
FROM CTE_Notifications_Sorted
WHERE PartitionSortId = 1 -- Filter out everything but the latest row of each group_key partition

By using the COUNT() function in a window function as well in the above CTE, you're able to remove your GROUP BY clause in the final SELECT.

Now that you have the associated column for the correct row, if you need to parse out a specific value you can follow this StackOverflow answer or leverage one of these JSON Functions and Operators.