I am looking for how to select multiple rows from one table based on the results from a subquery in PostgreSQL (10). I am not very familiar with advanced database concepts, so I might be missing something obvious here (i.e. my vocabulary might not be sufficient). I cannot see how to do it with PARTITION BY
, but maybe it is possible.
DB Fiddle: https://www.db-fiddle.com/f/w7EwvYujxg4wmL7wHkfBdD/0
CREATE TABLE devices(id INTEGER PRIMARY KEY, name VARCHAR UNIQUE);
CREATE TABLE results(id INTEGER PRIMARY KEY, test_name VARCHAR, results JSON, state VARCHAR);
CREATE TABLE results_for_device(results_id INTEGER, device_id INTEGER, PRIMARY KEY(results_id, device_id));
INSERT INTO devices(id, name) VALUES
(1, 'Dev1'),
(2, 'Dev2');
INSERT INTO results(id, test_name, results, state) VALUES
# Results thread for 'test1' and 'Dev1'
(1, 'test1', '{}', 'CURRENT_BLUEPRINT'), # Select all the below results in this section
(2, 'test1', '{}', 'MATCHING_BLUEPRINT'),
(3, 'test1', '{}', 'MATCHING_BLUEPRINT'),
(4, 'test1', '{}', 'NOT_MATCHING_BLUEPRINT'),
# Results thread for 'test1' and 'Dev2'
(5, 'test1', '{}', 'OLD_BLUEPRINT'),
(6, 'test1', '{}', 'NOT_MATCHING_BLUEPRINT'),
(7, 'test1', '{}', 'CURRENT_BLUEPRINT'), # Select all the below results in this section
(8, 'test1', '{}', 'MATCHING_BLUEPRINT'),
(9, 'test1', '{}', 'MATCHING_BLUEPRINT'),
# Results thread for 'test2' and 'Dev1'
(10, 'test2', '{}', 'OLD_BLUEPRINT'),
(11, 'test2', '{}', 'NOT_MATCHING_BLUEPRINT'),
(12, 'test2', '{}', 'CURRENT_BLUEPRINT'), # Select all the below results in this section
(13, 'test2', '{}', 'MATCHING_BLUEPRINT'),
(14, 'test2', '{}', 'MATCHING_BLUEPRINT'),
# Results thread for 'test2' and 'Dev2'
(15, 'test2', '{}', 'OLD_BLUEPRINT'),
(16, 'test2', '{}', 'CURRENT_BLUEPRINT'), # Select all the below results in this section
(17, 'test2', '{}', 'MATCHING_BLUEPRINT'),
(18, 'test2', '{}', 'MATCHING_BLUEPRINT'),
(19, 'test2', '{}', 'NOT_MATCHING_BLUEPRINT');
INSERT INTO results_for_device(results_id, device_id) VALUES
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 2),
(6, 2),
(7, 2),
(8, 2),
(9, 2),
(10, 1),
(11, 1),
(12, 1),
(13, 1),
(14, 1),
(15, 2),
(16, 2),
(17, 2),
(18, 2),
(19, 2);
# The desired results could be extracted by iterating the results of this subquery
# and querying for results >= results_id, but I want to do this in one go.
WITH result_threads AS (
SELECT r.id AS results_id,
d.id AS device_id
FROM results AS r
INNER JOIN results_for_device AS rfd ON rfd.results_id = r.id
INNER JOIN devices AS d ON d.id = rfd.device_id
WHERE r.state = 'CURRENT_BLUEPRINT'
)
SELECT * FROM result_threads; # WHAT TO DO HERE?
I have a subquery with multiple results (result_threads
). Then, for each of the results in the subquery, I want to select data from the results
table where id >= result_threads.id
, and maybe also section by / sort by device_id
and results_id
. That is, find the IDs with state 'CURRENT_BLUEPRINT'
(only 1 can exist at any point in time) and then select all results above that ID, for each result "thread".
Thinking in code I could first do the subquery, then iterate each row and perform one query per row with WHERE id > row.id
, but I want to do this in one go if possible.
This is the desired results as noted in comments when inserting the data:
-----------------------------------------------------------------
|device_id|results_id|test_name|results|state |
-----------------------------------------------------------------
| 1| 1| 'test1'| '{}'|'CURRENT_BLUEPRINT' |
| 1| 2| 'test1'| '{}'|'MATCHING_BLUEPRINT' |
| 1| 3| 'test1'| '{}'|'MATCHING_BLUEPRINT' |
| 1| 4| 'test1'| '{}'|'NOT_MATCHING_BLUEPRINT'|
| 2| 7| 'test1'| '{}'|'CURRENT_BLUEPRINT' |
| 2| 8| 'test1'| '{}'|'MATCHING_BLUEPRINT' |
| 2| 9| 'test1'| '{}'|'MATCHING_BLUEPRINT' |
| 1| 12| 'test2'| '{}'|'CURRENT_BLUEPRINT' |
| 1| 13| 'test2'| '{}'|'MATCHING_BLUEPRINT' |
| 1| 14| 'test2'| '{}'|'MATCHING_BLUEPRINT' |
| 2| 16| 'test2'| '{}'|'CURRENT_BLUEPRINT' |
| 2| 17| 'test2'| '{}'|'MATCHING_BLUEPRINT' |
| 2| 18| 'test2'| '{}'|'MATCHING_BLUEPRINT' |
| 2| 19| 'test2'| '{}'|'NOT_MATCHING_BLUEPRINT'|
-----------------------------------------------------------------
Best Answer
Add
test_name
to the CTE and your result threads will be fully defined. Now you can use the CTE output to generate the desired results:The part in bold shows the way you join
result_threads
to get your output. Basically,result_threads
defines each portion of the output bydevice_id
,test_name
and the startingresults_id
. You can see how the joining predicate forresult_threads
is using all three defining points.You may also have noticed that the
device
table is removed from the CTE definition. It is not necessary if the PK/FK relationship betweenresults_for_device
anddevice
is formally defined and thedevice_id
column inresults_for_device
is not nullable. Otherwise, of course, you will need to include it back. It will also be required if the initial set needs to be additionally filtered by the device name.Alternative
If performance is an issue, you could also try this alternative version, which attempts to solve the problem without hitting the underlying tables more than once:
In this solution, a starting and ending row, in the form of computed columns
this_id
andmax_id
, are defined for each distinct section of(device_id, test_name)
. The outer query uses the computed columns to filter theresults_id
value on.Both solutions can be tested at db<>fiddle.uk.