I have a table users
with 2.6 millions of rows and SELECT COUNT(*) FROM users
takes 2 seconds to succeed, but request for table stories
with 5.2 millions of rows takes more than an hour.
Explains are pretty similar:
explain select count(*) from users;
Finalize Aggregate (cost=89417.87..89417.88 rows=1 width=8)
-> Gather (cost=89417.65..89417.86 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=88417.65..88417.66 rows=1 width=8)
-> Parallel Seq Scan on users (cost=0.00..85702.72 rows=1085972 width=0)
explain select count(*) from stories;
Finalize Aggregate (cost=428235.66..428235.67 rows=1 width=8)
-> Gather (cost=428235.45..428235.66 rows=2 width=8)
Workers Planned: 2
-> Partial Aggregate (cost=427235.45..427235.46 rows=1 width=8)
-> Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories (cost=0.43..421752.81 rows=2193057 width=0)
Postgres version:
version
PostgreSQL 10.6 (Ubuntu 10.6-0ubuntu0.18.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 7.3.0-16ubuntu3) 7.3.0, 64-bit
Table definition of stories
:
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
-------------------------+---------+-----------+----------+---------------------------------------------------+----------+--------------+-------------
id | bigint | | not null | nextval('stories_id_seq'::regclass) | plain | |
rating | integer | | not null | | plain | |
number_of_pluses | integer | | not null | | plain | |
number_of_minuses | integer | | not null | | plain | |
title | text | | not null | | extended | |
content_blocks | jsonb | | not null | | extended | |
created_at_timestamp | bigint | | not null | | plain | |
story_url | text | | not null | | extended | |
tags | jsonb | | not null | | extended | |
number_of_comments | integer | | not null | | plain | |
is_deleted | boolean | | not null | | plain | |
is_rating_hidden | boolean | | not null | | plain | |
has_mine_tag | boolean | | not null | | plain | |
has_adult_tag | boolean | | not null | | plain | |
is_longpost | boolean | | not null | | plain | |
author_id | bigint | | not null | | plain | |
author_username | text | | not null | | extended | |
author_profile_url | text | | not null | | extended | |
author_avatar_url | text | | not null | | extended | |
community_link | text | | not null | | extended | |
community_name | text | | not null | | extended | |
comments_are_hot | boolean | | not null | | plain | |
added_timestamp | bigint | | not null | | plain | |
last_update_timestamp | bigint | | not null | | plain | |
next_update_timestamp | bigint | | not null | | plain | |
task_taken_at_timestamp | bigint | | not null | | plain | |
is_permanently_deleted | boolean | | not null | false | plain | |
Indexes:
"stories_pkey" PRIMARY KEY, btree (id)
"stories__added_timestamp__idx" btree (added_timestamp)
"stories__is_permanently_deleted__idx" btree (is_permanently_deleted)
"stories__last_update_timestamp__idx" btree (last_update_timestamp)
"stories__next_update_timestamp__idx" btree (next_update_timestamp)
"stories__task_taken_at_timestamp__idx" btree (task_taken_at_timestamp)
Index definition:
Index "public.stories__is_permanently_deleted__idx"
Column | Type | Definition | Storage
------------------------+---------+------------------------+---------
is_permanently_deleted | boolean | is_permanently_deleted | plain
btree, for table "public.stories"
After reindexing (as advised):
EXPLAIN (ANALYZE, BUFFERS) select count(*) from stories
:
Finalize Aggregate (cost=356218.22..356218.23 rows=1 width=8) (actual time=273577.971..273577.971 rows=1 loops=1)
Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
-> Gather (cost=356218.01..356218.22 rows=2 width=8) (actual time=272647.858..273602.243 rows=3 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
-> Partial Aggregate (cost=355218.01..355218.02 rows=1 width=8) (actual time=272938.947..272938.948 rows=1 loops=3)
Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
-> Parallel Index Only Scan using stories__is_permanently_deleted__idx on stories (cost=0.43..349741.33 rows=2190671 width=0) (actual time=0.386..271148.590 rows=1752497 loops=3)
Heap Fetches: 654818
Buffers: shared hit=186467 read=65977 dirtied=24 written=1166
Planning time: 0.726 ms
Execution time: 273602.447 ms
Best Answer
While you have a
Parallel Seq Scan
forusers
, you get aParallel Index Only Scan
forstories
- which is typically faster than the sequential scan on the table.If it's that slow, the obvious reason would be index bloat (or worse, index corruption).
Recreate the index and test again to see if that's so. If positive, investigate what bloats (or corrupts) your index. Corruption should be an extremely rare exception - unless you operate with faulty RAM / storage.
If you don't need exact counts, then there are much faster alternatives:
Aside: reorder columns in
stories
like this to save ~ 20 bytes per row:See: