Postgresql – Count nested joined table records only if the table associated via bridge table have two records with given names

countjoin;postgresqlqueryselect

I have the main table on which I have to apply count. I have two bridge tables and two nested tables for has_many relation. You can generate tables and bridge tables with below code and also insert some data

main_table

CREATE TABLE main_table
(
    id        serial PRIMARY KEY,
    is_active BOOLEAN,
    name      VARCHAR(50) UNIQUE NOT NULL
);

nested_table_1

CREATE TABLE nested_table_1
(
    id   serial PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

nested_table_2

CREATE TABLE nested_table_2
(
    id   serial PRIMARY KEY,
    name VARCHAR(50) UNIQUE NOT NULL
);

bridge_table_1

CREATE TABLE bridge_table_1
(
    main_table_id     INTEGER REFERENCES main_table (id),
    nested_table_1_id INTEGER REFERENCES nested_table_1 (id),
    PRIMARY KEY(main_table_id, nested_table_1_id)
);

bridge_table_2

CREATE TABLE bridge_table_2
(
    main_table_id     INTEGER REFERENCES main_table (id),
    nested_table_2_id INTEGER REFERENCES nested_table_2 (id),
    PRIMARY KEY(main_table_id, nested_table_2_id)
);

Query

SELECT COUNT(main.id)
      FILTER (where nes_2.name IN ('nested table second 1', 'nested table second 2') AND
                    nes_1.name = 'Contact')                              AS contact_compliants,
      COUNT(main.id)
      FILTER (where nes_2.name = 'Drop' AND nes_1.name = 'Dropley')      AS droplet_compliant,
      COUNT(main.id) FILTER (where nes_2.name IS NOT NULL AND
                                    nes_1.name IN ('Contact', 'Droplet')) AS total_opportunities
FROM main_table AS main
        INNER JOIN bridge_table_1 AS bt_1
                    ON bt_1.main_table_id = main.id
        INNER JOIN nested_table_1 AS nes_1
                    ON bt_1.nested_table_1_id = nes_1.id
        INNER JOIN bridge_table_2 AS bt_2
                    ON bt_2.main_table_id = main.id
        INNER JOIN nested_table_2 AS nes_2
                    ON nes_2.id = bt_2.nested_table_2_id
WHERE main.is_active = 'true';

The query above is working all great counting everything fine. Just one issue. In the first part for contact_compliants, it has IN which means either one of the names for nested will increment count. which is not my requirement. I want to count only when when both the names matches.

to be clear: How do I make this part where nes_2.name IN ('nested table second 1', 'nested table second 2') to count only when nes_2.name = 'nested table second 1' AND nes_2.name = 'nested table second 2'). I have tried adding double join so that I can compare but no success. I have also seen similar solution but I am not sure, how it will apply here with Having

Insert some data with expected output

INSERT INTO main_table (is_active, name, id) VALUES (true, 'main 1', 1);
INSERT INTO main_table (is_active, name, id) VALUES (true, 'main 2', 2);

INSERT INTO nested_table_1 (name, id) VALUES ('Contact', 1);
INSERT INTO nested_table_1 (name, id) VALUES ('Dropley', 2);

INSERT INTO nested_table_2 (name, id) VALUES ('nested table second 1', 1);
INSERT INTO nested_table_2 (name, id) VALUES ('nested table second 2', 2);
INSERT INTO nested_table_2 (name, id) VALUES ('nested table second 3', 3);
INSERT INTO nested_table_2 (name, id) VALUES ('Drop', 4);

INSERT INTO bridge_table_1 (main_table_id, nested_table_1_id) VALUES (1, 1);
INSERT INTO bridge_table_2 (main_table_id, nested_table_2_id) VALUES (1, 1);
-- Count for contact_compliants should be zero till here but its 1

After inserting all above data, expected output should be
contact_compliants,droplet_compliant,total_opportunities
0,0,1

But its actually
contact_compliants,droplet_compliant,total_opportunities
1,0,1

INSERT INTO bridge_table_2 (main_table_id, nested_table_2_id) VALUES (1, 2);
-- Similarly, Count for contact_compliants should be 1 after inserting above but its 2

Best Answer

Please consider this... not a ready-made solution but a suggestion which should lead you in a proper direction.

In the first part for contact_compliants, it has IN which means either one of the names for nested will increment count. which is not my requirement. I want to count only when when both the names matches.

Both? Then your whole idea with JOIN + count(*) will not work.

The problem is clearly visible if you replace the whole SELECT list with *:

SELECT * FROM main_table AS main
INNER JOIN bridge_table_1 AS bt_1 ON bt_1.main_table_id = main.id
INNER JOIN nested_table_1 AS nes_1 ON bt_1.nested_table_1_id = nes_1.id
INNER JOIN bridge_table_2 AS bt_2 ON bt_2.main_table_id = main.id
INNER JOIN nested_table_2 AS nes_2 ON nes_2.id = bt_2.nested_table_2_id
WHERE main.is_active = 'true';
 id | is_active |  name  | main_table_id | nested_table_1_id | id |  name   | main_table_id | nested_table_2_id | id |         name          
----+-----------+--------+---------------+-------------------+----+---------+---------------+-------------------+----+-----------------------
  1 | t         | main 1 |             1 |                 1 |  1 | Contact |             1 |                 1 |  1 | nested table second 1
(1 row)

See? The FROM clause sees only one row. All the further FILTER tricks are a no-op. This stems from SQL conceptual workflow which is pretty well described in SELECT statement Description.

Looks like you need outer joins to catch the data you need. But then it will explode the rowcount of your result set.

So you need to rethink the whole query, especially this part which calculates contact_compliants. Maybe try to use EXISTS subqueries instead of JOINs?.

SELECT
  (SELECT count(*) FROM ...
   WHERE EXISTS (...)
   AND EXISTS (...)) AS contact_compliants, 
  ... /* more subqueries here */
FROM main_table AS main
WHERE main.is_active;

Side note: I'm not sure if your schema is optimal, maybe the design is too convoluted and you can simplify it?