Postgresql – Why are some count queries so slow

execution-planoptimizationperformancepostgresqlquery-performance

I have a table users with 2.6 millions of rows and SELECT COUNT(*) FROM users takes 2 seconds to succeed, but request for table stories with 5.2 millions of rows takes more than an hour.

Explains are pretty similar:

explain select count(*) from users;

 Finalize Aggregate  (cost=89417.87..89417.88 rows=1 width=8)
   ->  Gather  (cost=89417.65..89417.86 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=88417.65..88417.66 rows=1 width=8)
               ->  Parallel Seq Scan on users  (cost=0.00..85702.72 rows=1085972 width=0)

explain select count(*) from stories;

 Finalize Aggregate  (cost=428235.66..428235.67 rows=1 width=8)
   ->  Gather  (cost=428235.45..428235.66 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=427235.45..427235.46 rows=1 width=8)
               ->  Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories  (cost=0.43..421752.81 rows=2193057 width=0)

Postgres version:

                                                            version                                                            
 PostgreSQL 10.6 (Ubuntu 10.6-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0, 64-bit

Table definition of stories:

         Column          |  Type   | Collation | Nullable |                      Default                      | Storage  | Stats target | Description 
-------------------------+---------+-----------+----------+---------------------------------------------------+----------+--------------+-------------
 id                      | bigint  |           | not null | nextval('stories_id_seq'::regclass) | plain    |              | 
 rating                  | integer |           | not null |                                                   | plain    |              | 
 number_of_pluses        | integer |           | not null |                                                   | plain    |              | 
 number_of_minuses       | integer |           | not null |                                                   | plain    |              | 
 title                   | text    |           | not null |                                                   | extended |              | 
 content_blocks          | jsonb   |           | not null |                                                   | extended |              | 
 created_at_timestamp    | bigint  |           | not null |                                                   | plain    |              | 
 story_url               | text    |           | not null |                                                   | extended |              | 
 tags                    | jsonb   |           | not null |                                                   | extended |              | 
 number_of_comments      | integer |           | not null |                                                   | plain    |              | 
 is_deleted              | boolean |           | not null |                                                   | plain    |              | 
 is_rating_hidden        | boolean |           | not null |                                                   | plain    |              | 
 has_mine_tag            | boolean |           | not null |                                                   | plain    |              | 
 has_adult_tag           | boolean |           | not null |                                                   | plain    |              | 
 is_longpost             | boolean |           | not null |                                                   | plain    |              | 
 author_id               | bigint  |           | not null |                                                   | plain    |              | 
 author_username         | text    |           | not null |                                                   | extended |              | 
 author_profile_url      | text    |           | not null |                                                   | extended |              | 
 author_avatar_url       | text    |           | not null |                                                   | extended |              | 
 community_link          | text    |           | not null |                                                   | extended |              | 
 community_name          | text    |           | not null |                                                   | extended |              | 
 comments_are_hot        | boolean |           | not null |                                                   | plain    |              | 
 added_timestamp         | bigint  |           | not null |                                                   | plain    |              | 
 last_update_timestamp   | bigint  |           | not null |                                                   | plain    |              | 
 next_update_timestamp   | bigint  |           | not null |                                                   | plain    |              | 
 task_taken_at_timestamp | bigint  |           | not null |                                                   | plain    |              | 
 is_permanently_deleted  | boolean |           | not null | false                                             | plain    |              | 
Indexes:
    "stories_pkey" PRIMARY KEY, btree (id)
    "stories__added_timestamp__idx" btree (added_timestamp)
    "stories__is_permanently_deleted__idx" btree (is_permanently_deleted)
    "stories__last_update_timestamp__idx" btree (last_update_timestamp)
    "stories__next_update_timestamp__idx" btree (next_update_timestamp)
    "stories__task_taken_at_timestamp__idx" btree (task_taken_at_timestamp)

Index definition:

     Index "public.stories__is_permanently_deleted__idx"
         Column         |  Type   |       Definition       | Storage 
------------------------+---------+------------------------+---------
 is_permanently_deleted | boolean | is_permanently_deleted | plain
btree, for table "public.stories"

After reindexing (as advised):
EXPLAIN (ANALYZE, BUFFERS) select count(*) from stories:

 Finalize Aggregate  (cost=356218.22..356218.23 rows=1 width=8) (actual time=273577.971..273577.971 rows=1 loops=1)
   Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
   ->  Gather  (cost=356218.01..356218.22 rows=2 width=8) (actual time=272647.858..273602.243 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
         ->  Partial Aggregate  (cost=355218.01..355218.02 rows=1 width=8) (actual time=272938.947..272938.948 rows=1 loops=3)
               Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
               ->  Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories  (cost=0.43..349741.33 rows=2190671 width=0) (actual time=0.386..271148.590 rows=1752497 loops=3)
                     Heap Fetches: 654818
                     Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
 Planning time: 0.726 ms
 Execution time: 273602.447 ms

Best Answer

While you have a Parallel Seq Scan for users, you get a Parallel Index Only Scan for stories - which is typically faster than the sequential scan on the table.

If it's that slow, the obvious reason would be index bloat (or worse, index corruption).

Recreate the index and test again to see if that's so. If positive, investigate what bloats (or corrupts) your index. Corruption should be an extremely rare exception - unless you operate with faulty RAM / storage.

REINDEX INDEX stories__is_permanently_deleted__idx;

If you don't need exact counts, then there are much faster alternatives:

Fast way to discover the row count of a table in PostgreSQL

Aside: reorder columns in stories like this to save ~ 20 bytes per row:

         Column          |  Type   
-------------------------+---------
 id                      | bigint  
 created_at_timestamp    | bigint  
 added_timestamp         | bigint  
 last_update_timestamp   | bigint  
 next_update_timestamp   | bigint  
 task_taken_at_timestamp | bigint  
 author_id               | bigint
 rating                  | integer 
 number_of_pluses        | integer 
 number_of_minuses       | integer 
 number_of_comments      | integer  
 is_deleted              | boolean 
 is_rating_hidden        | boolean 
 has_mine_tag            | boolean 
 has_adult_tag           | boolean 
 is_longpost             | boolean 
 comments_are_hot        | boolean 
 is_permanently_deleted  | boolean
 author_username         | text    
 author_profile_url      | text    
 author_avatar_url       | text    
 community_link          | text    
 community_name          | text    
 title                   | text    
 story_url               | text    
 content_blocks          | jsonb   
 tags                    | jsonb

See:

Configuring PostgreSQL for read performance

Best Answer

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

Sql-server – Some queries are very slow after index rebuild

Related Question