Postgresql, trying to understand why queries against one table are slower than another

performancepostgresqlquery-performanceselect

I have two tables that I am querying. Both tables are relatively similar, but queries against one of the tables takes a lot longer than queries against the other, and I can't figure out why. I am running Postgresql 9.5.

First, here's what the tables look like:

dbname=# \d+
                             List of relations
 Schema |        Name         | Type  |  Owner   |    Size    | Description
--------+---------------------+-------+----------+------------+-------------
 public | skill_skill         | table | postgres | 32 GB      |
 public | title_skill         | table | postgres | 47 GB      |

dbname=# \d title_skill
   Table "public.title_skill"
   Column   |  Type  | Modifiers
------------+--------+-----------
 title_name | text   | not null
 skill_name | text   | not null
 count      | bigint | not null
Indexes:
    "title_skill_idx" btree (title_name, skill_name)
    "title_skill_skill_idx" btree (skill_name)

dbname=# \d skill_skill
    Table "public.skill_skill"
   Column    |  Type  | Modifiers
-------------+--------+-----------
 skill1_name | text   | not null
 skill2_name | text   | not null
 count       | bigint | not null
Indexes:
    "skill_skill_idx" btree (skill1_name, skill2_name)
    "skill_skill_skill1_idx" btree (skill1_name)

And here's the results of an explain analyze against the faster table:

dbname=# explain analyze select * from skill_skill where skill1_name = 'java' order by count desc limit 1000;
                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=893197.90..893200.40 rows=1000 width=47) (actual time=185.662..186.053 rows=1000 loops=1)
   ->  Sort  (cost=893197.90..893886.82 rows=275566 width=47) (actual time=185.660..185.832 rows=1000 loops=1)
         Sort Key: count DESC
         Sort Method: top-N heapsort  Memory: 127kB
         ->  Bitmap Heap Scan on skill_skill  (cost=7544.33..878088.92 rows=275566 width=47) (actual time=18.496..138.866 rows=228547 loops=1)
               Recheck Cond: (skill1_name = 'java'::text)
               Heap Blocks: exact=1889
               ->  Bitmap Index Scan on skill_skill_skill1_idx  (cost=0.00..7475.44 rows=275566 width=0) (actual time=17.912..17.912 rows=228547 loops=1)
                     Index Cond: (skill1_name = 'java'::text)
 Planning time: 0.103 ms
 Execution time: 186.180 ms
(11 rows)

Here's the results against the slow table:

dbname=# explain analyze select * from title_skill where skill_name = 'java' order by count desc limit 1000;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=142190.06..142192.56 rows=1000 width=56) (actual time=106267.537..106267.918 rows=1000 loops=1)
   ->  Sort  (cost=142190.06..142282.40 rows=36934 width=56) (actual time=106267.537..106267.693 rows=1000 loops=1)
         Sort Key: count DESC
         Sort Method: top-N heapsort  Memory: 127kB
         ->  Bitmap Heap Scan on title_skill  (cost=926.81..140165.01 rows=36934 width=56) (actual time=178.043..105714.465 rows=627761 loops=1)
               Recheck Cond: (skill_name = 'java'::text)
               Rows Removed by Index Recheck: 55084364
               Heap Blocks: exact=5323 lossy=600606
               ->  Bitmap Index Scan on title_skill_skill_idx  (cost=0.00..917.58 rows=36934 width=0) (actual time=172.088..172.088 rows=627761 loops=1)
                     Index Cond: (skill_name = 'java'::text)
 Planning time: 0.102 ms
 Execution time: 106268.059 ms
(12 rows)

And here's the row count from each table:

dbname=# select count(*) from title_skill;
   count
-----------
 565424890
(1 row)

dbname=# select count(*) from skill_skill;
   count
-----------
 435104200
(1 row)

Looking at both of the tables, they look relatively similar in terms of size, indices, and so on. For this reason, I am having a hard time understanding why one of the tables is so much slower than the other. Any help with this is greatly appreciated.

Best Answer

Putting the index on (skill_name, count) rather than just (skill_name) would help a lot. That way it could directly walk the index to get the 1000 highest counts within the 'java' skill, rather than needing to do a sort and throw away 99.8% of the results.

But, that doesn't explain the situation, it just circumvents it, so I'll ignore that solution for the sake of providing an explanation.

The faster table is well clustered. It needs to visit 228547 rows, but only needs to visit 1889 pages to do so. For each page it reads, it find 120 rows which meet the condition of skill1 = 'java'. (120 rows is about as many as you can fit onto a page, so those pages are packed full of nothing but 'java' entries)

The slower table is not well clustered on that column. To visit 627761 rows, it needs to visit 605929 pages, or just under 1.04 relevant rows per page.

Worse, many of these pages are "lossy", which certainly slows things down. Increasing work_mem could help there. Since the faster query visits 3 times fewer rows, it doesn't over-flow work_mem the same way as the slower one does. If it were the other way around and the one that was well-clustered was overflowing work_mem, the situation would be much better. When it overflows, you have to inspect every row in the page to recheck the condition. If 120 rows per page satisfy that recheck, this is no big deal. When only 1.04 rows per page satisfy the rcheck, that is a lot of extra work for nothing.

If increasing work_mem is not enough, you could also cluster the table on title_skill_skill_idx, which would put all the rows with the skill of 'java' next to each other in the table, like they are in the faster case. Of course if the table is already well-clustered on another column (title, for example) then changing this clustering could improve one query only to make a different ones worse.