MySQL vs PostgreSQL: Benchmarking COUNT(*) execution speed

I benchmark DBs to find out the best for my project and I found that count(*) is extremely slow in PostgeSQL. And I don't understand is it a normal behaviour of PostgeSQL or I do something wrong.

I have a table with ~200M records. MySQL table definition:

CREATE TABLE t1 (
  id int(11) NOT NULL AUTO_INCREMENT,
  t2_id int(11) NOT NULL,
....  
  PRIMARY KEY (id),
  KEY index_t2 (t2_id)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Request (returns ~30M):

SELECT COUNT(*) FROM t1 WHERE t2_id = 7;

runs:

25,797ms MySQL (v5.7.11)

1,222,168ms PostgeSQL (v9.5)

Explain:

MySQL:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: t1
   partitions: NULL
         type: ref
possible_keys: index_t2
          key: index_t2
      key_len: 4
          ref: const
         rows: 59438630
     filtered: 100.00
        Extra: Using index
1 row in set, 1 warning (0.00 sec)

PostgreSQL

Aggregate  (cost=4469365.02..4469365.03 rows=1 width=0)
 ->  Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=0)
       Recheck Cond: (t2_id = 7)
       ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0)
             Index Cond: (t2_id = 7)

Server: AWS RDS (db.r3.xlarge) vCPU:4 Memory:30Gb

Updated (2016-09-20):

> explain (analyze, buffers) SELECT COUNT(*) FROM t1 WHERE t2_id = 7;

QUERY PLAN                                                                                     
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=4469365.02..4469365.03 rows=1 width=4) (actual time=1213456.539..1213456.539 rows=1 loops=1)
   Buffers: shared read=2734808
   ->  Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=4) (actual time=64015.828..1205542.421 rows=31383566 loops=1)
         Recheck Cond: (t2_id = 7)
         Rows Removed by Index Recheck: 108582028
         Heap Blocks: exact=19929 lossy=2606242
         Buffers: shared read=2734808
         ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0) (actual time=64009.598..64009.598 rows=31383566 loops=1)
               Index Cond: (t2_id = 7)
               Buffers: shared read=108637
 Planning time: 0.080 ms
 Execution time: 1213456.891 ms
(12 rows)

Time: 1213484.579 ms

Updated (2016-09-21):

> explain (analyze, buffers) SELECT t2_id FROM t1 WHERE t2_id = 7;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=715817.34..4382635.74 rows=34691712 width=114) (actual time=59954.834..1234070.436 rows=31383566 loops=1)
   Recheck Cond: (t2_id = 7)
   Rows Removed by Index Recheck: 108582028
   Heap Blocks: exact=19929 lossy=2606242
   Buffers: shared hit=4824 read=2729984
   ->  Bitmap Index Scan on index_t2  (cost=0.00..707144.41 rows=34691712 width=0) (actual time=59948.598..59948.598 rows=31383566 loops=1)
         Index Cond: (t2_id = 7)
         Buffers: shared hit=4824 read=103813
 Planning time: 0.086 ms
 Execution time: 1239826.408 ms
(10 rows)

Time: 1239827.053 ms

Best Answer

The way that both RDBMS do the count differs. In InnoDB we have the following behaviour by default:

To process a SELECT COUNT(*) FROM t statement, InnoDB scans an index of the table, which takes some time if the index is not entirely in the buffer pool.

For Postgres, you may want to try to see if an index-only scan (which is closer to the InnoDB behaviour) can help you on this. More info here. Due the amount of rows and the bad cardinality of that value (almost 15% of the table according stats), I can't warranty that it will work, but you can try:

SELECT COUNT(t2_id) FROM t1 WHERE t2_id = 7;

Best Answer

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

PostgreSQL Indexing – How to Index WHERE (start_date >= ‘2013-12-15’)

Related Question