Postgresql – Postgres: multiple selects that exclude results from previous ones

postgresql

I have a table of widgets that have a number of connections with potentially overlapping modes. For instance, widget A may have 10 connections, and 5 of those can run in mode A and all 10 can run in mode B. A connection has at least one supported mode. I'm trying to figure out how to design a query that returns a list of devices that support a given number of connections with particular modes.

The simplified schema looks like this:

table widgets
  id
  name

table connections
  id
  widget_id

table modes_connections
  connection_id
  mode_id

table modes
  id
  name

I need to return widget_ids that satisfy filters similar to:

2 connections with mode A AND
2 connections with mode B AND
1 connection with mode C

I can't just join everything together because the first filter for mode A must exclude those results from the other filters, similarly the mode B filter must exclude those results from the mode C filter, etc.

Also, I'm not sure how to prioritize results so connections with the least number of modes have preference. Consider the case where there are 3 connections that support modes A, B, C and two that support mode B. In the filter example above, the mode B filter should select the only-mode-B connections, allowing the A,B,C mode connections to satisfy the requirements for modes A and C.

I'm totally at a dead end. Any suggestions or pointers would be appreciated. Redesigning the schema is also an option.

Best Answer

create table widgets (id int, name text);
create table connections(id int, widget_id int);
create table modes_connections(connection_id int, mode_id int);
create table modes (id int, name text);

insert into widgets values
(1, 'widget1'), (2, 'widget2'),(3, 'widget3'),(4, 'widget4');

insert into connections values
(1, 1),(2, 1),(3, 1),(4,1),(5,1),(6,1),(7,1),(8,1),(9,1),(10,1),
(11, 2),(12, 2),(13, 3),(14, 3),(15, 4);

insert into modes_connections values
(1, 1),(1, 2),(1, 3),(1, 4),(1, 5),
(2, 1),(2, 2),(2, 3),(2, 4),(2, 5),
(2, 6),(2, 7),(2, 8),(2, 9),(2, 10),
(3, 5),(3, 6),(3, 7),(3, 8),(3, 9);

insert into modes values 
(1, 'A'),(2, 'B'),(3, 'C'),(4, 'D');

select w.id, w.name, m.id, m.name, count(c.id) connections
from   widgets w
join   connections c
on     c.widget_id = w.id
join   modes_connections mc
on     mc.connection_id = c.id
join   modes m
on     m.id = mc.mode_id
group by w.id, w.name, m.id, m.name 
;

id | name    | id | name | connections
-: | :------ | -: | :--- | ----------:
 1 | widget1 |  1 | A    |           2
 1 | widget1 |  2 | B    |           2
 1 | widget1 |  3 | C    |           2
 1 | widget1 |  4 | D    |           2

You can add more conditions to the WHERE clause using the format:

exists (select 1 from x where id = wdg.id and ......)

;with x as
(
select w.id, w.name, m.id mode_id, m.name mode_name, count(c.id) connections
from   widgets w
join   connections c
on     c.widget_id = w.id
join   modes_connections mc
on     mc.connection_id = c.id
join   modes m
on     m.id = mc.mode_id
group by w.id, w.name, m.id, m.name 
)
select id, name
from   widgets wdg
where  exists (select 1 from x where id=wdg.id and mode_id = 1 and connections = 2)
and    exists (select 1 from x where id=wdg.id and mode_id = 2 and connections = 2)

id | name   
-: | :------
 1 | widget1

dbfiddle here

Related Solutions

Postgresql – Store table names to reference other tables

Storing table names in other (user) tables is an instance of storing structural metadata and user data side by side. This has a variety of interesting effects, and some of those effects make it attractive to designers. The down side is less apparent.

The down side is this: you end up doing a lot of data management "by hand" in the SQL that a different design might have allowed the DBMS to do for you. This makes your code harder to maintain and slower to run.

Having said that, I'll admit to having pulled this stunt several times, and the results were usually good enough so that I didn't regret the choice.

There's a second thing going on in your case. Types 1 and 2 of both widgets and products are cases of class/subclass modeling (or, if you prefer type/subtype modeling). This kind of thing is simple and straightforward in object modeling, because inheritance takes care of most of the difficulties for you. Not so in relational modeling. Relational modeling, as such, has no mechanism for inheritance. Some variants of SQL have extensions that make inheritance easier to model and to implement.

There are many case of questions here in the DBA area that boil down to the question of how to implement subclasses (or subtypes) in SQL tables. Some of those questions are grouped under this tag: subtypes. Over in StackOverflow, there are even more such questions, and there are three tags that relate to three design techniques that might help: single-table-inheritance, class-table-inheritance, and shared-primary-key.

Your design resembles a class table inheritance design, except that you use embedded table names instead of shared primary key to implement the IS-A relationships between subclasses and classes. You might want to explore using shared primary key, and then creating views that collect all the data for each subclass by joining the superclass table with each subclass. I'm not sure you want to go this way. It could get awfully unwieldy once you have hundreds of different product types.

Postgresql – Omit max entry from each group in postgres

My first thought would be to use a window function like ROW_NUMBER(), almost identical to your solution.

Here are a few more ways to write this query:

WITH mytable AS
  ( --- the query --- )
SELECT a_key
FROM mytable AS t
WHERE EXISTS 
      ( SELECT *
        FROM mytable AS n
        WHERE t.a_value = n.a_value
          AND t.a_timestamp < n.a_timestamp
      ) ;

Using another window function, LEAD():

SELECT a_key
FROM
  ( SELECT a_key,
           (LEAD(a_timestamp) OVER (PARTITION BY a_value
                                    ORDER BY a_timestamp)
            IS NOT NULL) AS ok
    FROM mytable
  ) AS pointless
WHERE ok ;

A variation on the above, using a different condition to check which rows to keep:

           (LEAD(a_value) OVER (ORDER BY a_value, a_timestamp)
            = a_value) AS ok

And a rather weird solution:

SELECT a_key
FROM mytable

EXCEPT

( SELECT DISTINCT ON (a_value) a_key
  FROM mytable
  ORDER BY a_value, a_timestamp DESC
) ;

Modified NOT IN to NOT EXISTS:

SELECT a_key
FROM mytable AS t
WHERE NOT EXISTS
      ( SELECT 1 
        FROM mytable AS m 
        WHERE m.a_value = t.a_value 
        HAVING MAX(m.a_timestamp) = t.a_timestamp
      ) ;

And modified again to a JOIN:

SELECT t.a_key
FROM mytable AS t 
  JOIN
    ( SELECT a_value, max(a_timestamp) AS a_timestamp 
      FROM mytable                   
      GROUP BY a_value 
    ) AS m 
  ON  t.a_value = m.a_value 
  AND t.a_timestamp < m.a_timestamp ;

Regarding performance, I did a test with a small 200K rows table (not a subquery), with and without indexes, of the various methods.

Since the query needs to return a large majority of rows (more than 50% and could be close to 100% depending on the distribution), I wouldn't expect indexes to be particularly helpful.

The window function solutions (ROW_NUMBER(), RANK(), LEAD()) performed quite well and similarly to each other (less than 2 sec).

The EXISTS method came a bit slower and last was the DISTINCT ON method (around 3 sec).

The NOT IN method by @joanolo shows a materialized subplan and was really slow (but it may be more efficient if the mytable subquery returns fewer rows). Modified to a similar NOT EXISTS lowered the response time to about 3 seconds. The JOIN modification was somewhat better, around 2 - 2.5 sec.

The plans showed sequential scans of course and most improved with indexes, doing index scans instead (and lowering response time to about 1-1.5 sec for the window functions and the join methods).

(I used (a_value, a_timestamp) and (a_value, a_timestamp, a_key) indexes and variations changing to timestamp DESC but the actual indexes are more or less irrelevant to the specific example, since we have no idea how complex the sybquery is.)

Best Answer

Related Solutions

Postgresql – Store table names to reference other tables

Postgresql – Omit max entry from each group in postgres

Related Question