Mysql – Postgres heap table performance on SSD

MySQLperformancepostgresqlpostgresql-performancequery-performancerdbms

I've read Performance difference between MySQL and PostgreSQL for the same schema/queries.. Is it still relevant with appearance of Solid State Drive?

That article said that Postgres is better for complex queries and subquering, and worse for simple queries by ids and sequential scans.

Postgres is worse/slower for some queries due each table is a heap, means hasn't clustered index, means rows are not physically sorted on hard drive by their primary key. So if you want to read a significant portion of records (by id as I understand) from Postgres – that will lead to many random I/O and poorer performance if compare with MySQL.

Questions:

But how about SSD? Have sequential reads got same performance for Postgres and MySQL?
Also does it make any sense to use SSD instead of HHD with RDBMs?

Update: related to Rick James' answer

MySQL use Plan A and Postgres use Plan B, right?
Plan A means clustered index, right? The table is physically sorted by lastName + firstName.
Plan A on HDD could perform better that Plan B on SSD, so Postgres + SSD is not an silver bullet. You need to choose MySQL for quering clustered index, right?
Pls more details about which queries do you mean by I/O bound? In my understanding any call to database == I/O bound operation.

Best Answer

"Count the disk hits."

Let's construct a simplistic table and query against it. Then let's discuss how two database engines will perform that query.

We have a table with lastname and firstname, plus other stuff. The table is quite big; so big that it cannot be cached in RAM. The query needs to look up all the people with a given last name and some first initial:

SELECT * FROM tbl
    WHERE lastname = 'James'
      AND firstname LIKE 'R%';

Plan A: The table is ordered by lastname+firstname using a B+Tree. The operation work like this:

Locate the first (James,R) row. This requires drilling down the layers of a B+Tree of depth perhaps 4. It is very likely that all non-leaf nodes in such a tree will stay cached. But the leaf node won't be (because of the size of the atable). Cost (so far): 0 disk hits for the non-leaf nodes, plus 1 disk hit for the leaf node.
Scan forward through the B+Tree. Let's say the typical block in the tree holds 100 records. Let's say there are 300 names that match. Cost: About 3 more disk hits.

Total cost: ~4 disk hits.

Plan B: There is an INDEX(lastname, firstname) to hash-pointer, which then leads to the data.

Traverse that index to find all the 300 matching names. If the INDEX is something like a BTree, that is, again, ~4 disk hits.
For each of the 300 names, do a hash lookup in the data to get SELECT *. Since the look up involves a hash, the records will be scattered around. And, since I hypoticated that the table was too big to be cached, that is upwards of 300 disk hits to find the records.

Total cost: Somewhere between 4 and 304; let's be generous and say 100.

Further, let's say there is 1ms of CPU for the entire query. And let's say 10ms for HDD disk hit and 0.5ms for SDD disk hit.

Plan             HDD (ms)   SDD (ms)   RAM (ms, if the table were fully cached)
Plan A: B+Tree    41          3          1
Plan B: Hash    1001         51          1

(Side note: Indexing on UUID in any database engine is like throwing out BTree and using only Hash.)

Until SSDs become as fast as RAM, there will be a performance gap.

As your you Question 2 (Does it make sense to use SSDs): If you are I/O-bound, then yes. Otherwise, it may not be "cost-effective". An RDBMS tries to cache everything, thereby eliminating I/O (for reads); if the data is small enough, the I/O speed does not matter.

On the other hand, writes are much more likely to be dependent on I/O speed. But that is another discussion (with a similar conclusion).

Responding to question update:

Disclosure: I am very familiar with MyISAM; Plan A is very close to MySQL's InnoDB. I am very unfamiliar with Postgres; I do not know if Plan B relates to it.
Yes.
Yes.
It depends. A SELECT where everything it needs is cached in ram: no I/O. A write in a transaction with setting at maximum security: always I/O. And there is lots of "in between".

Update: related to Rick James' answer

Best Answer

Related Solutions

Postgresql – Postgres Write Performance on Intel S3700 SSD

Postgresql – Postgres – Insert performance with indexes

Related Question