Postgresql – Select results in one table from multiple rows in subquery

postgresql

I am looking for how to select multiple rows from one table based on the results from a subquery in PostgreSQL (10). I am not very familiar with advanced database concepts, so I might be missing something obvious here (i.e. my vocabulary might not be sufficient). I cannot see how to do it with PARTITION BY, but maybe it is possible.

DB Fiddle: https://www.db-fiddle.com/f/w7EwvYujxg4wmL7wHkfBdD/0

CREATE TABLE devices(id INTEGER PRIMARY KEY, name VARCHAR UNIQUE);
CREATE TABLE results(id INTEGER PRIMARY KEY, test_name VARCHAR, results JSON, state VARCHAR);
CREATE TABLE results_for_device(results_id INTEGER, device_id INTEGER, PRIMARY KEY(results_id, device_id));

INSERT INTO devices(id, name) VALUES
    (1, 'Dev1'),
    (2, 'Dev2');

INSERT INTO results(id, test_name, results, state) VALUES

    # Results thread for 'test1' and 'Dev1'
    (1, 'test1', '{}', 'CURRENT_BLUEPRINT'),       # Select all the below results in this section
    (2, 'test1', '{}', 'MATCHING_BLUEPRINT'),
    (3, 'test1', '{}', 'MATCHING_BLUEPRINT'),
    (4, 'test1', '{}', 'NOT_MATCHING_BLUEPRINT'),

    # Results thread for 'test1' and 'Dev2'
    (5, 'test1', '{}', 'OLD_BLUEPRINT'),
    (6, 'test1', '{}', 'NOT_MATCHING_BLUEPRINT'),
    (7, 'test1', '{}', 'CURRENT_BLUEPRINT'),       # Select all the below results in this section
    (8, 'test1', '{}', 'MATCHING_BLUEPRINT'),
    (9, 'test1', '{}', 'MATCHING_BLUEPRINT'),

    # Results thread for 'test2' and 'Dev1'
    (10, 'test2', '{}', 'OLD_BLUEPRINT'),
    (11, 'test2', '{}', 'NOT_MATCHING_BLUEPRINT'),
    (12, 'test2', '{}', 'CURRENT_BLUEPRINT'),      # Select all the below results in this section
    (13, 'test2', '{}', 'MATCHING_BLUEPRINT'),
    (14, 'test2', '{}', 'MATCHING_BLUEPRINT'),

    # Results thread for 'test2' and 'Dev2'
    (15, 'test2', '{}', 'OLD_BLUEPRINT'),
    (16, 'test2', '{}', 'CURRENT_BLUEPRINT'),      # Select all the below results in this section
    (17, 'test2', '{}', 'MATCHING_BLUEPRINT'),
    (18, 'test2', '{}', 'MATCHING_BLUEPRINT'),
    (19, 'test2', '{}', 'NOT_MATCHING_BLUEPRINT');

INSERT INTO results_for_device(results_id, device_id) VALUES
    (1, 1),
    (2, 1),
    (3, 1),
    (4, 1),
    (5, 2),
    (6, 2),
    (7, 2),
    (8, 2),
    (9, 2),
    (10, 1),
    (11, 1),
    (12, 1),
    (13, 1),
    (14, 1),
    (15, 2),
    (16, 2),
    (17, 2),
    (18, 2),
    (19, 2);

# The desired results could be extracted by iterating the results of this subquery
# and querying for results >= results_id, but I want to do this in one go.
WITH result_threads AS (
    SELECT  r.id AS results_id,
            d.id AS device_id
    FROM    results AS r
    INNER JOIN  results_for_device AS rfd ON rfd.results_id = r.id
    INNER JOIN  devices AS d ON d.id = rfd.device_id
    WHERE r.state = 'CURRENT_BLUEPRINT'
)
SELECT * FROM result_threads; # WHAT TO DO HERE?

I have a subquery with multiple results (result_threads). Then, for each of the results in the subquery, I want to select data from the results table where id >= result_threads.id, and maybe also section by / sort by device_id and results_id. That is, find the IDs with state 'CURRENT_BLUEPRINT' (only 1 can exist at any point in time) and then select all results above that ID, for each result "thread".

Thinking in code I could first do the subquery, then iterate each row and perform one query per row with WHERE id > row.id, but I want to do this in one go if possible.

This is the desired results as noted in comments when inserting the data:

-----------------------------------------------------------------
|device_id|results_id|test_name|results|state                   |
-----------------------------------------------------------------
|        1|         1|  'test1'|   '{}'|'CURRENT_BLUEPRINT'     |
|        1|         2|  'test1'|   '{}'|'MATCHING_BLUEPRINT'    |
|        1|         3|  'test1'|   '{}'|'MATCHING_BLUEPRINT'    |
|        1|         4|  'test1'|   '{}'|'NOT_MATCHING_BLUEPRINT'|
|        2|         7|  'test1'|   '{}'|'CURRENT_BLUEPRINT'     |
|        2|         8|  'test1'|   '{}'|'MATCHING_BLUEPRINT'    |
|        2|         9|  'test1'|   '{}'|'MATCHING_BLUEPRINT'    |
|        1|        12|  'test2'|   '{}'|'CURRENT_BLUEPRINT'     |
|        1|        13|  'test2'|   '{}'|'MATCHING_BLUEPRINT'    |
|        1|        14|  'test2'|   '{}'|'MATCHING_BLUEPRINT'    |
|        2|        16|  'test2'|   '{}'|'CURRENT_BLUEPRINT'     |
|        2|        17|  'test2'|   '{}'|'MATCHING_BLUEPRINT'    |
|        2|        18|  'test2'|   '{}'|'MATCHING_BLUEPRINT'    |
|        2|        19|  'test2'|   '{}'|'NOT_MATCHING_BLUEPRINT'|
-----------------------------------------------------------------

Best Answer

Add test_name to the CTE and your result threads will be fully defined. Now you can use the CTE output to generate the desired results:

WITH
  result_threads AS
  (
    SELECT
      rfd.*,
      r.test_name
    FROM
      results_for_device AS rfd
      INNER JOIN results AS r ON rfd.results_id = r.id
    WHERE
      r.state = 'CURRENT_BLUEPRINT' 
  )
SELECT
  rfd.device_id,
  rfd.results_id,
  r.test_name,
  r.results,
  r.state
FROM
  results_for_device AS rfd
  INNER JOIN results AS r ON rfd.results_id = r.id
  INNER JOIN result_threads AS rt
    ON  rfd.device_id = rt.device_id
    AND r.test_name = rt.test_name
    AND r.id >= rt.results_id
;

The part in bold shows the way you join result_threads to get your output. Basically, result_threads defines each portion of the output by device_id, test_name and the starting results_id. You can see how the joining predicate for result_threads is using all three defining points.

You may also have noticed that the device table is removed from the CTE definition. It is not necessary if the PK/FK relationship between results_for_device and device is formally defined and the device_id column in results_for_device is not nullable. Otherwise, of course, you will need to include it back. It will also be required if the initial set needs to be additionally filtered by the device name.

Alternative

If performance is an issue, you could also try this alternative version, which attempts to solve the problem without hitting the underlying tables more than once:

SELECT
  device_id,
  results_id,
  test_name,
  results,
  state
FROM
  (
    SELECT
      rfd.device_id,
      rfd.results_id,
      r.test_name,
      r.results,
      r.state,
      MAX(r.id) FILTER (WHERE r.state = 'CURRENT_BLUEPRINT') OVER w AS this_id,
      MAX(r.id)                                              OVER w AS max_id
    FROM
      results_for_device AS rfd
      INNER JOIN results AS r ON rfd.results_id = r.id
    WINDOW
      w AS (PARTITION BY rfd.device_id, r.test_name)
  ) AS derived
WHERE
  results_id BETWEEN this_id AND max_id
;

In this solution, a starting and ending row, in the form of computed columns this_id and max_id, are defined for each distinct section of (device_id, test_name). The outer query uses the computed columns to filter the results_id value on.


Both solutions can be tested at dbfiddle logodb<>fiddle.uk.