Postgresql – Why seq-scan can be much faster than index-scan and index-only-scan in this simple query

indexperformancepostgresqlpostgresql-performance

I am using PostgreSQL 9.4.4. I have a query like this:

 SELECT COUNT(*) FROM A,B WHERE A.a = B.b

a and b are the Primary Keys of tables A and B, so there are B-indexes on a & b

By default, PostgreSQL will use seq-scan on AB and use hash join, I force it to do the index-scan and index-only-scan.

The result showed that, seq scan is much faster than the other two, it takes more time to do the full scan on a,b for index-scan and index-only-scan.

EXPLAIN ANALYZE SELECT COUNT(*) FROM journal,paper WHERE journal.paper_id = paper.paper_id;

enter image description here

Can someone explain it?

Thank you so much!

Best Answer

This is quite a common query (pardon the pun! :-) ) from people running queries which perform full table scans (FTS), when the poster feels that the system should make use of the index(es).

Basically, it boils down to the explanation given here. If the tables are so small, the optimiser will say that "it's not worth the bother of going to the index, doing a lookup and then fetching the data, instead, I'll just slurp in all the data and pick out what I need", i.e. perform an FTS.

[EDIT in answer to @txsing's comment]

For an MVCC (multi-version concurrency control) database, you have to traverse every record for a count at a given moment - that's why, for example, a COUNT(*) is much more expensive for MySQL's InnoDB rather than MyISAM.

An excellent explantion (for PostgreSQL) is available here. The guy who wrote this post is a "major contributor" to PostgreSQL (thanks to @dezso for leading me to that post).