PostgreSql subquery evaluated without filtering

postgresqlpostgresql-performancequery-performance

I am struggling to optimize the following query:

SELECT tbA.id, tbB.createDate
FROM
 tbA
LEFT OUTER JOIN
 (SELECT id, MAX(createDate) as createDate FROM tb GROUP BY id) as tbB
ON tbA.id=tbB.id
WHERE tbA.someDate >= '2021-05-11' and tbA.someDate <= '2021-05-12'

Both tables have a comparable number of rows and the condition does limit the result to 2000 rows (from approximately 1 mil). The problem is the inner select where the planner does evaluate the select (scan, sort & aggregate) before applying the condition. While the expected behavior would be at first to limit the rows by id from the join (as in MSSQL).
The above query is a simplified version of the real query. I am using a few views and the last/top view is used by the entity framework and contains the conditions.

EDIT 2
The query:

SELECT
    d.*,
    lrm."OriginalDealStatus" AS "MessageStatus"
FROM
    "DealStore"."DealsData" d
LEFT OUTER JOIN
    (SELECT "ReplyMessages"."ForeFrontID",
            max("ReplyMessages"."CreatedDate") AS "LastCreatedDate",
             '' as "OriginalDealStatus"
    FROM "FeedProcessor"."ReplyMessages"
    GROUP BY "ReplyMessages"."ForeFrontID"
    ) lrm
    ON d."ForeFrontID" = lrm."ForeFrontID"
    WHERE d."TradeDay" >= '2021-05-11' AND d."TradeDay" <= '2021-05-12'

The execution plan:

Nested Loop Left Join  (cost=275949.07..299006.19 rows=622 width=327) (actual time=1421.176..1427.758 rows=2093 loops=1)
  ->  Merge Left Join  (cost=275948.64..293934.08 rows=622 width=303) (actual time=1421.124..1422.709 rows=2093 loops=1)
        Merge Cond: (d.ForeFrontID = ReplyMessages.ForeFrontID)
        ->  Sort  (cost=171.93..173.49 rows=622 width=271) (actual time=2.703..2.894 rows=2093 loops=1)
              Sort Key: d.ForeFrontID
              Sort Method: quicksort  Memory: 652kB
              ->  Index Scan using IDX_NC_Deals_TradeDay on Deals d  (cost=0.43..143.07 rows=622 width=271) (actual time=0.016..1.534 rows=2093 loops=1)
                    Index Cond: ((TradeDay >= '2021-05-11'::date) AND (TradeDay <= '2021-05-12'::date))
        ->  GroupAggregate  (cost=275776.71..285877.95 rows=630249 width=44) (actual time=1130.147..1384.325 rows=688123 loops=1)
              Group Key: ReplyMessages.ForeFrontID
              ->  Sort  (cost=275776.71..277676.08 rows=759750 width=4) (actual time=1130.133..1210.091 rows=726161 loops=1)
                    Sort Key: ReplyMessages.ForeFrontID
                    Sort Method: external merge  Disk: 10296kB
                    ->  Seq Scan on ReplyMessages  (cost=0.00..191179.50 rows=759750 width=4) (actual time=0.068..784.905 rows=759705 loops=1)
  ->  Index Scan using pk_dealsxml_forefrontid on DealsXml dx  (cost=0.43..8.12 rows=1 width=388) (actual time=0.002..0.002 rows=1 loops=2093)
        Index Cond: (ForeFrontID = d.ForeFrontID)
Planning Time: 0.653 ms
Execution Time: 1429.023 ms

You may notice that there's an: Seq Scan on ReplyMessages & Sort & GroupAggregate

I would expect Postgres to evaluate the previous query much faster as in the bellow example (have added the LATERAL):

SELECT
    d.*,
    lrm."OriginalDealStatus" AS "MessageStatus"
FROM
    "DealStore"."DealsData" d
LEFT OUTER JOIN LATERAL
    (SELECT "ReplyMessages"."ForeFrontID",
            max("ReplyMessages"."CreatedDate") AS "LastCreatedDate",
             '' as "OriginalDealStatus"
    FROM "FeedProcessor"."ReplyMessages"
    WHERE d."ForeFrontID" = "ForeFrontID"
    GROUP BY "ReplyMessages"."ForeFrontID"
    ) lrm
    ON d."ForeFrontID" = lrm."ForeFrontID"
    WHERE d."TradeDay" >= '2021-05-11' AND d."TradeDay" <= '2021-05-12'

and the execution plan:

Nested Loop Left Join  (cost=1.28..10488.19 rows=622 width=327) (actual time=3.116..15.949 rows=2093 loops=1)
  ->  Nested Loop Left Join  (cost=0.85..5191.86 rows=622 width=655) (actual time=0.035..6.775 rows=2093 loops=1)
        ->  Index Scan using IDX_NC_Deals_TradeDay on Deals d  (cost=0.43..143.07 rows=622 width=271) (actual time=0.022..1.751 rows=2093 loops=1)
              Index Cond: ((TradeDay >= '2021-05-11'::date) AND (TradeDay <= '2021-05-12'::date))
        ->  Index Scan using pk_dealsxml_forefrontid on DealsXml dx  (cost=0.43..8.12 rows=1 width=388) (actual time=0.002..0.002 rows=1 loops=2093)
              Index Cond: (ForeFrontID = d.ForeFrontID)
  ->  Subquery Scan on lrm  (cost=0.42..8.47 rows=1 width=36) (actual time=0.004..0.004 rows=0 loops=2093)
        Filter: (d.ForeFrontID = lrm.ForeFrontID)
        ->  GroupAggregate  (cost=0.42..8.46 rows=1 width=44) (actual time=0.003..0.003 rows=0 loops=2093)
              Group Key: ReplyMessages.ForeFrontID
              ->  Index Only Scan using idx_test2 on ReplyMessages  (cost=0.42..8.44 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=2093)
                    Index Cond: (ForeFrontID = d.ForeFrontID)
                    Heap Fetches: 746
Planning Time: 0.637 ms
Execution Time: 16.139 ms

Best Answer

Surely something as simple as this will work adequately:

SELECT tbA.id, (SELECT MAX(createDate) FROM tb WHERE id = tbA.id) as createDate
FROM tbA
WHERE tbA.someDate >= '2021-05-11' and tbA.someDate <= '2021-05-12'