Postgresql – Limit rows in Correlated Subquery

casepostgresqlquery-performancesubquery

I have one big and two small issues with my query I need help with.

First, I want a SQL (postgres if it matters) query that gets SUM of last two ratings of fruits and makes a new column for each fruit column on original table with corresponding result. Conditions are that it has to have happened in the past (before current row date) and I only want the last 2 records and column for 'rush_order' has to have value 'yes'.

Right now my query gets all records but I only want last 2. I tried to use LIMIT in each of the 4 subqueries and also tried OFFSET 0 and FETCH FIRST 2 ROWS ONLY but can't get it to work. If there is a way to use the OFFSET it could eliminate the need to use the comparison of date < current_row.date as the table will always be ordered by ascending date.

Second, if there is advice on how to use CASE here I think it would be much faster. I am running 4 subequeries for every row as it is; there is too much repetition.

Third, I think another way around this is to stack the name1 and rating1 on top of the name2 and rating2 on top of each other. Not necessary but am open to another way of solving this.

Thank you

CREATE TABLE IF NOT EXISTS fruit (
   fruit_id INTEGER  NOT NULL ,
   name1 varchar(50) NOT NULL,
    name2 varchar(50) NOT NULL,
   rating1 integer NOT NULL,
    rating2 integer NOT NULL,
     delivered_on date NOT NULL,
     rush_order varchar(3) NOT NULL,

   PRIMARY KEY  (fruit_id)
 )

INSERT INTO fruit (fruit_id, name1, name2, rating1, rating2, delivered_on, rush_order) VALUES
(1, 'Apple', 'Banana', 5, 4,'2015-10-01', 'yes'),
(2, 'Pear', 'Banana', 3, 4,'2015-10-27', 'yes'),
(3, 'Apple', 'Pear', 5, 4,'2015-10-29', 'no'),
(4, 'Banana', 'Apple', 2, 5,'2015-11-27', 'yes'),
(5, 'Banana', 'Apple', 4, 1,'2015-11-29', 'yes'),
(6, 'Apple', 'Banana', 4, 2,'2015-12-02', 'no'),
(7, 'Pear', 'Apple', 3, 5,'2015-12-20', 'yes');

SELECT *,
    COALESCE(
        (SELECT sum(f2.rating1)
    FROM fruit f2
        WHERE f1.name1 = f2.name1 AND f1.delivered_on > f2.delivered_on),0) +
    COALESCE(
        (SELECT sum(f3.rating2)
    FROM fruit f3
        WHERE f1.name1 = f3.name2 AND f1.delivered_on > f3.delivered_on) ,0) 
            AS fruit1_rating,
    COALESCE(
        (SELECT sum(f4.rating2)
    FROM fruit f4
        WHERE f1.name2 = f4.name2 AND f1.delivered_on > f4.delivered_on),0) +
    COALESCE(
        (SELECT sum(f5.rating1)
    FROM fruit f5
        WHERE f1.name2 = f5.name1 AND f1.delivered_on > f5.delivered_on) ,0) AS fruit2_rating
FROM fruit f1;

Expected result:
Expected table in csv

Best Answer

The Selected tables in the subqueries must be limited first before summing them up, but i can't reproduce your exspected result

CREATE TABLE IF NOT EXISTS fruit (
   fruit_id INTEGER  NOT NULL ,
   name1 varchar(50) NOT NULL,
    name2 varchar(50) NOT NULL,
   rating1 integer NOT NULL,
    rating2 integer NOT NULL,
     delivered_on date NOT NULL,
     rush_order varchar(3) NOT NULL,

   PRIMARY KEY  (fruit_id)
 );

INSERT INTO fruit (fruit_id, name1, name2, rating1, rating2, delivered_on, rush_order) VALUES
(1, 'Apple', 'Banana', 5, 4,'2015-10-01', 'yes'),
(2, 'Pear', 'Banana', 3, 4,'2015-10-27', 'yes'),
(3, 'Apple', 'Pear', 5, 4,'2015-10-29', 'no'),
(4, 'Banana', 'Apple', 2, 5,'2015-11-27', 'yes'),
(5, 'Banana', 'Apple', 4, 1,'2015-11-29', 'yes'),
(6, 'Apple', 'Banana', 4, 2,'2015-12-02', 'no'),
(7, 'Pear', 'Apple', 3, 5,'2015-12-20', 'yes');
 SELECT *,
    COALESCE(
        (SELECT sum(f2.rating1)
    FROM (SELECt * FROM fruit f2
        WHERE f1.name1 = f2.name1 AND f1.delivered_on > f2.delivered_on AND f2.rush_order = 'yes' ORDER BY f2.delivered_on LIMIT 2) f2),0) +
        COALESCE(
        (SELECT sum(f3.rating1)
    FROM (SELECt * FROM fruit f3
        WHERE f1.name1 = f3.name1 AND f1.delivered_on > f3.delivered_on AND f3.rush_order = 'yes' ORDER BY f3.delivered_on LIMIT 2) f3),0) AS fruit1_rating,
            COALESCE(
        (SELECT sum(f4.rating1)
    FROM (SELECt * FROM fruit f4
        WHERE f1.name1 = f4.name2 AND f1.delivered_on > f4.delivered_on AND f4.rush_order = 'yes' ORDER BY f4.delivered_on LIMIT 2) f4),0) +
        COALESCE(
        (SELECT sum(f5.rating1)
    FROM (SELECt * FROM fruit f5
        WHERE f1.name1 = f5.name2 AND f1.delivered_on > f5.delivered_on AND f5.rush_order = 'yes' ORDER BY f5.delivered_on LIMIT 2) f5),0) AS fruit2_rating
FROM fruit f1;
fruit_id | name1  | name2  | rating1 | rating2 | delivered_on | rush_order | fruit1_rating | fruit2_rating
-------: | :----- | :----- | ------: | ------: | :----------- | :--------- | ------------: | ------------:
       1 | Apple  | Banana |       5 |       4 | 2015-10-01   | yes        |             0 |             0
       2 | Pear   | Banana |       3 |       4 | 2015-10-27   | yes        |             0 |             0
       3 | Apple  | Pear   |       5 |       4 | 2015-10-29   | no         |            10 |             0
       4 | Banana | Apple  |       2 |       5 | 2015-11-27   | yes        |             0 |            16
       5 | Banana | Apple  |       4 |       1 | 2015-11-29   | yes        |             4 |            16
       6 | Apple  | Banana |       4 |       2 | 2015-12-02   | no         |            10 |            12
       7 | Pear   | Apple  |       3 |       5 | 2015-12-20   | yes        |             6 |             0

db<>fiddle here