Postgresql – Gather values from different related rows with JSONB fields

jsonpostgresqlquery

Say I have the following table structure:

jobs
+----+------------+
| id | some_field |
+----+------------+
|  1 |  some_val  |
|  2 |  some_val  |
|  3 |  some_val  |
+----+------------+

events
+----+--------+----------------------------------------+
| id | job_id |            payload (JSONB)             |
+----+--------+----------------------------------------+
|  1 |      1 | {'type':1, 'a':'some_val', ...}        |
|  2 |      1 | {'type':2, 'c':'some_other_val1', ...} |
|  3 |      2 | {'type':1, 'a':'some_other_val2', ...} |
|  4 |      2 | {'type':1, 'a':'some_other_val3', ...} |
|  5 |      3 | {'type':1, 'a':'some_other_val4', ...} |
|  6 |      3 | {'type':1, 'a':'some_other_val5', ...} |
|  7 |      3 | {'type':2, 'c':'some_other_val6', ...} |
|  8 |      3 | {'type':3, 'd':'some_other_val7', ...} |
+----+--------+----------------------------------------+

I want to create a query/view that …

  • … selects one row for each job IF it has an event with payload->>'type' = 2
  • … in that row, shows values from other events that are associated with this job.
  • … in that row, shows values from other events that are associated with this job if that event type occurs multiple times.

For example, with the above data I'd like to select all jobs that have an event of type 2, and find out the value of the d field from events with type 3 and some aggregate (say, COUNT()) of the a field from events with type 1. The result would be:

Results
+--------+------------------+----------+
| job_id |        d         | COUNT(a) |
+--------+------------------+----------+
|      1 |  NULL            |        1 |
|      3 |  some_other_val7 |        2 |
+--------+------------------+----------+

As a bonus: Each job can have only one event of type 2, so the basic query can be structured like:

SELECT
  job.id
FROM
  event
LEFT JOIN 
    job ON event.job_id = job.id
WHERE
  event.payload ->> 'type' = 2;

How can I get the result table? Do I need to do sub queries for each field/event type, which would result in relatively bad performance, or can I use some GROUP BY job_id on the event table and maybe join the JSONs? (Note that my data is more complex and has many more fields in the payload that I would like to select).


Here is a DB Fiddle with the example data and my test query:

CREATE TABLE job (
      id SERIAL PRIMARY KEY,
      some_field TEXT
);

CREATE TABLE event (
      id SERIAL PRIMARY KEY,
      job_id INTEGER,
      payload JSONB
);

INSERT INTO job (some_field) VALUES ('val1'), ('val2'), ('val3');
INSERT INTO event (job_id, payload) VALUES 
    (1, '{"type":1, "a":"some_val"}'::json),
    (1, '{"type":2, "c":"some_other_val1"}'::json),
    (2, '{"type":1, "a":"some_other_val2"}'::json),
    (2, '{"type":1, "a":"some_other_val3"}'::json),
    (3, '{"type":1, "a":"some_other_val4"}'::json),
    (3, '{"type":1, "a":"some_other_val5"}'::json),
    (3, '{"type":2, "c":"some_other_val6"}'::json),
    (3, '{"type":3, "d":"some_other_val7"}'::json);

SELECT job.id, job.some_field 
FROM event 
LEFT JOIN  job 
     ON event.job_id = job.id WHERE event.payload ->> 'type' = '2';
id | some_field
-: | :---------
 1 | val1      
 3 | val3      

db<>fiddle here

Best Answer

As per comments, you need a timestamp field and I've added it to your sample data. Now IMHO you should first unnest your jsonb data. I've used a SUM(CASE for this purpose and it calculates too the minimum timestamp value.

SELECT 
    event.job_id, 
    job.some_field,
    event.payload->>'type' AS type, 
    SUM(CASE WHEN event.payload->>'a' IS NOT NULL THEN 1 ELSE 0 END) AS a,
    SUM(CASE WHEN event.payload->>'b' IS NOT NULL THEN 1 ELSE 0 END) AS b,
    SUM(CASE WHEN event.payload->>'c' IS NOT NULL THEN 1 ELSE 0 END) AS c,
    SUM(CASE WHEN event.payload->>'d' IS NOT NULL THEN 1 ELSE 0 END) AS d,
    MIN((event.payload->>'ts'::text)::timestamp) as min_ts
FROM   
    event
JOIN
    job
    ON event.job_id = job.id
GROUP BY 
    event.job_id, 
    job.some_field,
    event.payload->>'type'
ORDER BY
    1, 2;

This is the result:

job_id | some_field | type |  a |  b |  c |  d | min_ts             
-----: | :--------- | :--- | -: | -: | -: | -: | :------------------
     1 | val1       | 1    |  1 |  0 |  0 |  0 | 2004-10-19 10:23:54
     1 | val1       | 2    |  0 |  0 |  1 |  0 | 2002-10-19 10:23:54
     2 | val2       | 1    |  2 |  0 |  0 |  0 | 2003-10-19 10:23:54
     3 | val3       | 1    |  2 |  0 |  0 |  0 | 2004-10-19 10:23:54
     3 | val3       | 2    |  0 |  0 |  1 |  0 | 2005-10-19 10:23:54
     3 | val3       | 3    |  0 |  0 |  0 |  1 | 2006-10-19 10:23:54

db<>fiddle here

Starting at this point you can easily filter it by type and get the desired result.