Postgresql – Why seq-scan can be much faster than index-scan and index-only-scan in this simple query

indexperformancepostgresqlpostgresql-performance

I am using PostgreSQL 9.4.4. I have a query like this:

 SELECT COUNT(*) FROM A,B WHERE A.a = B.b

a and b are the Primary Keys of tables A and B, so there are B-indexes on a & b

By default, PostgreSQL will use seq-scan on AB and use hash join, I force it to do the index-scan and index-only-scan.

The result showed that, seq scan is much faster than the other two, it takes more time to do the full scan on a,b for index-scan and index-only-scan.

EXPLAIN ANALYZE SELECT COUNT(*) FROM journal,paper WHERE journal.paper_id = paper.paper_id;

Can someone explain it?

Thank you so much!

Best Answer

This is quite a common query (pardon the pun! :-) ) from people running queries which perform full table scans (FTS), when the poster feels that the system should make use of the index(es).

Basically, it boils down to the explanation given here. If the tables are so small, the optimiser will say that "it's not worth the bother of going to the index, doing a lookup and then fetching the data, instead, I'll just slurp in all the data and pick out what I need", i.e. perform an FTS.

[EDIT in answer to @txsing's comment]

For an MVCC (multi-version concurrency control) database, you have to traverse every record for a count at a given moment - that's why, for example, a COUNT(*) is much more expensive for MySQL's InnoDB rather than MyISAM.

An excellent explantion (for PostgreSQL) is available here. The guy who wrote this post is a "major contributor" to PostgreSQL (thanks to @dezso for leading me to that post).

Best Answer

Related Solutions

Postgresql – Why does PostgreSQL aggregate result in the Index or Seq scan outputting all columns of the table

Postgresql – Why PostreSQL 9.2 does not use index scan if it can

Related Question