Postgresql – Postgres 200-minute-long queries: why is the query planner not using index

performancepostgresqlpostgresql-performance

I'm hoping someone here can help me understand why Postgres' query planner is opting for a 200-minute-long sequential scan, rather than using a lightning-fast index.

We have seven sets of (FooClaims, FooClaimLines) tables, e.g. (CarrierClaims, CarrierClaimLines). For the five smaller sets of tables, the query planner is doing the correct thing and utilizing the indexes on both tables when we query them. But for the two largest sets of tables, it opts to use a table scan, instead.

The query I'm having trouble with looks like this (explain output is included further below):

select * from "FooClaims" inner join "FooClaimLines" on "FooClaims"."claimId" = "FooClaimLines"."parentClaim" where "FooClaims"."beneficiaryId" = '12345';

Roughly speaking, the tables look like this (full details included further below):

                          Table "public.FooClaims"
                 Column                  |         Type          | Modifiers 
-----------------------------------------+-----------------------+-----------
 claimId                                 | character varying(15) | not null
 beneficiaryId                           | character varying(15) | not null
... (other not relevant columns)
Indexes:
    "FooClaims_pkey" PRIMARY KEY, btree ("claimId")
    "FooClaims_beneficiaryId_idx" btree ("beneficiaryId")
Foreign-key constraints:
    "FooClaims_beneficiaryId_to_Beneficiaries" FOREIGN KEY ("beneficiaryId") REFERENCES "Beneficiaries"("beneficiaryId")
Referenced by:
    TABLE ""FooClaimLines"" CONSTRAINT "FooClaimLines_parentClaim_to_FooClaims" FOREIGN KEY ("parentClaim") REFERENCES "FooClaims"("claimId")
Tablespace: "fooclaims_ts"

                      Table "public.FooClaimLines"
               Column                |         Type          | Modifiers 
-------------------------------------+-----------------------+-----------
 lineNumber                          | numeric               | not null
 parentClaim                         | character varying(15) | not null
... (other not relevant columns)
Indexes:
    "FooClaimLines_pkey" PRIMARY KEY, btree ("parentClaim", "lineNumber")
Foreign-key constraints:
    "FooClaimLines_parentClaim_to_FooClaims" FOREIGN KEY ("parentClaim") REFERENCES "FooClaims"("claimId")
Tablespace: "fooclaimlines_ts"

These are big tables:

  • CarrierClaims: 3 billion rows
  • CarrierClaimLines: 7 billion rows
  • OutpatientClaims: 600 million rows
  • OutpatientClaimLines: 4 billion rows
  • DMEClaims: 230 million rows
  • DMEClaimLines: 420 million rows
  • Beneficiaries: 66 million rows

For the smaller tables, this query uses all of the indexes and runs in milliseconds. For the two larger tables, it goes for a Seq Scan of the FooClaimLines table and takes about 200 minutes. Gah!

And if I break apart the joined query, it uses the indexes just fine even on the largest tables, e.g.:

select "claimId" from "FooClaims" where "FooClaims"."beneficiaryId" = '12345' limit 5;
select * from "FooClaimLines" where "FooClaimLines"."parentClaim" in ('1', '2', '3', '4', '5');

Also — and this seems really weird to me — if I switch the joined queries to a select count(*) ... instead of a select * ..., the query planner will decide to use the indexes even on the largest tables.

Bluntly: I'm not super familiar with how PostgreSQL makes its query planning decisions, and I'm unsure what might be causing this behavior.

Things I've Tried

  1. Crying.
  2. Running a vacuum freeze analyze on all of the problem table pairs.
  3. And now: asking you, friendly internet stranger, for help.

My backup plan here is, of course, to rejigger our app to not use the inner join query and instead perform the query manually. I'd really prefer to avoid that if at all possible, though.

Full Details

Full Details: explain Output

Here's the explain output for one of the "slow" queries:

> explain select * from "OutpatientClaims" inner join "OutpatientClaimLines" on "OutpatientClaims"."claimId" = "OutpatientClaimLines"."parentClaim" where "OutpatientClaims"."beneficiaryId" = '12345';                                                      
                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=21892.78..135879128.24 rows=39638 width=2110)
   Hash Cond: (("OutpatientClaimLines"."parentClaim")::text = ("OutpatientClaims"."claimId")::text)
   ->  Seq Scan on "OutpatientClaimLines"  (cost=0.00..119688035.24 rows=4311681024 width=233)
   ->  Hash  (cost=21824.66..21824.66 rows=5449 width=1877)
         ->  Bitmap Heap Scan on "OutpatientClaims"  (cost=202.80..21824.66 rows=5449 width=1877)
               Recheck Cond: (("beneficiaryId")::text = '12345'::text)
               ->  Bitmap Index Scan on "OutpatientClaims_beneficiaryId_idx"  (cost=0.00..201.44 rows=5449 width=0)
                     Index Cond: (("beneficiaryId")::text = '12345'::text)
(8 rows)

And for one of the "fast" queries:

> explain select * from "DMEClaims" inner join "DMEClaimLines" on "DMEClaims"."claimId" = "DMEClaimLines"."parentClaim" where "DMEClaims"."beneficiaryId" = '12345';                                                                                                           
                                               QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=85.06..3401617.25 rows=3960 width=411)
   ->  Bitmap Heap Scan on "DMEClaims"  (cost=84.49..8227.57 rows=2054 width=218)
         Recheck Cond: (("beneficiaryId")::text = '12345'::text)
         ->  Bitmap Index Scan on "DMEClaims_beneficiaryId_idx"  (cost=0.00..83.97 rows=2054 width=0)
               Index Cond: (("beneficiaryId")::text = '12345'::text)
   ->  Index Scan using "DMEClaimLines_pkey" on "DMEClaimLines"  (cost=0.57..1646.38 rows=571 width=193)
         Index Cond: (("parentClaim")::text = ("DMEClaims"."claimId")::text)
(7 rows)

Full Details: Other

Update #1: enable_seq_scan = false

If I tell Postgres to only run a sequential scan if it really really has to, the query planner starts behaving:

> set enable_seqscan = false;
SET
Time: 0.545 ms

> explain select * from "OutpatientClaims" inner join "OutpatientClaimLines" on "OutpatientClaims"."claimId" = "OutpatientClaimLines"."parentClaim" where "OutpatientClaims"."beneficiaryId" = '12345';
                                                        QUERY PLAN                                                        
--------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=203.51..506205086.02 rows=39638 width=2110)
   ->  Bitmap Heap Scan on "OutpatientClaims"  (cost=202.80..21824.66 rows=5449 width=1877)
         Recheck Cond: (("beneficiaryId")::text = '12345'::text)
         ->  Bitmap Index Scan on "OutpatientClaims_beneficiaryId_idx"  (cost=0.00..201.44 rows=5449 width=0)
               Index Cond: (("beneficiaryId")::text = '12345'::text)
   ->  Index Scan using "OutpatientClaimLines_pkey" on "OutpatientClaimLines"  (cost=0.71..92576.04 rows=31867 width=233)
         Index Cond: (("parentClaim")::text = ("OutpatientClaims"."claimId")::text)
(7 rows)

Time: 2.560 ms

That seems like a pretty brute force approach to me, though. Is there a better way to convince the query planner that this is the sane thing to do?

Best Answer

you don't even shows us the definition of "OutpatientClaimLines"."parentClaim" how could we know why it's not using an index on that table?

Moreover, you have a lot of mistakes in this schema.

  1. Stop using capital letters and double quotes anywhere.
  2. No join condition should be on a varchar/text (non-int) field if you've got billions of rows. Use bigint or UUID.
  3. Why are you calling it parentClaim. That's just a foreign key. you don't call all the foreign key's that reference other tables parent. That's a really bad convention. You call something a parent if there is a parent/child relationship.

I think you need a consultant.