I've read Performance difference between MySQL and PostgreSQL for the same schema/queries.. Is it still relevant with appearance of Solid State Drive?
That article said that Postgres is better for complex queries and subquering, and worse for simple queries by ids and sequential scans.
Postgres is worse/slower for some queries due each table is a heap, means hasn't clustered index, means rows are not physically sorted on hard drive by their primary key. So if you want to read a significant portion of records (by id as I understand) from Postgres – that will lead to many random I/O and poorer performance if compare with MySQL.
Questions:
- But how about SSD? Have sequential reads got same performance for Postgres and MySQL?
- Also does it make any sense to use SSD instead of HHD with RDBMs?
Update: related to Rick James' answer
- MySQL use
Plan A
and Postgres usePlan B
, right? Plan A
means clustered index, right? The table is physically sorted bylastName + firstName
.Plan A
on HDD could perform better thatPlan B
on SSD, so Postgres + SSD is not an silver bullet. You need to choose MySQL for quering clustered index, right?- Pls more details about which queries do you mean by I/O bound? In my understanding
any call to database == I/O bound operation
.
Best Answer
"Count the disk hits."
Let's construct a simplistic table and query against it. Then let's discuss how two database engines will perform that query.
We have a table with lastname and firstname, plus other stuff. The table is quite big; so big that it cannot be cached in RAM. The query needs to look up all the people with a given last name and some first initial:
Plan A: The table is ordered by lastname+firstname using a B+Tree. The operation work like this:
Total cost: ~4 disk hits.
Plan B: There is an
INDEX(lastname, firstname)
to hash-pointer, which then leads to the data.SELECT *
. Since the look up involves a hash, the records will be scattered around. And, since I hypoticated that the table was too big to be cached, that is upwards of 300 disk hits to find the records.Total cost: Somewhere between 4 and 304; let's be generous and say 100.
Further, let's say there is 1ms of CPU for the entire query. And let's say 10ms for HDD disk hit and 0.5ms for SDD disk hit.
(Side note: Indexing on UUID in any database engine is like throwing out BTree and using only Hash.)
Until SSDs become as fast as RAM, there will be a performance gap.
As your you Question 2 (Does it make sense to use SSDs): If you are I/O-bound, then yes. Otherwise, it may not be "cost-effective". An RDBMS tries to cache everything, thereby eliminating I/O (for reads); if the data is small enough, the I/O speed does not matter.
On the other hand, writes are much more likely to be dependent on I/O speed. But that is another discussion (with a similar conclusion).
Responding to question update:
SELECT
where everything it needs is cached in ram: no I/O. A write in a transaction with setting at maximum security: always I/O. And there is lots of "in between".