Database Performance – Do SSDs Reduce the Usefulness of Databases?

database-designhardwareindexperformancessd

I only heard about Robert Martin today, and it seems like he's a notable figure in the software world, so I don't mean for my title to appear as if it's a click bait or me putting words in his mouth, but this is simply how I interpreted what I heard from him with my limited experience and understanding.

I was watching a video today (on software architecture), on a talk by Robert C. Martin, and in the latter half of the video, the topic of databases was the main focus.

From my understanding of what he said, it seemed like he was saying that SSDs will reduce the usefulness of databases (considerably).

To explain how I came to this interpretation:

He discussed how with HDDs/spinning disks, retrieving data is slow. However, these days we use SSDs, he noted. He starts off with "RAM is coming" and then continues by mentioning RAM disks, but then says he can't call it RAM disk, so resorts to just saying RAM. So with RAM, we don't need the indexes, because every byte takes the same time to get. (this paragraph is paraphrased by me)

So, him suggesting RAM (as in computer memory) as a replacement for DBs (as that's what I interpreted his statement as) doesn't make sense because that's like saying all the records are in-memory processed in the lifetime of an application (unless you pull from a disk file on demand)

So, I resorted to thinking by RAM, he means SSD. So, in that case, he's saying SSDs reduce the usefulness of databases. He even says "If I was Oracle, I'd be scared. The very foundation of why I exist is evaporating."

From my little understanding of SSDs, unlike HDDs, which are O(n) seek time (I'd think), SSDs are near O(1), or almost random. So, his suggestion was interesting to me, because I've never thought about it like that.
The first time I was introduced to databases a few years ago, when a professor was describing the benefits over regular filesystem,
I concluded the primary role of a database is essentially being a very indexed filesystem (as well as optimizations, caching, concurrent access, etc), thus, if indexes aren't needed in SSD, this kind of does make databases less useful.

Regardless of that though, prefacing that I'm a newb, I find it hard to believe that they become less useful, as everyone still uses DBs as the primary point of their application, instead of pure filesystem, and felt as if he was oversimplifying the role of databases.

Note: I did watch till the end to make sure he didn't say something different.

For reference:
42:22 is when the whole database topic comes up,
43:52 is when he starts off with "Why do we even have databases"

This answer does say SSDs speed DBs up considerably.
This question asks about how optimization is changed.

To TL;DR my question, does the advent of widespread SSD use in the server market (whether it's upcoming or has happened already) reduce the usefulness of databases?

It seemed like what the presenter was trying to convey was that with SSDs, one can store the data on disk, and not have to worry about how slow it would be to retrieve it as with older HDDs, as with SSDs, seek times are near O(1) (I think). So, in the event of that being true, that would hypothetically lose one of the advantages it had: indexing, because the advantage of having indexes for faster seek times is gone.

Best Answer

There are some things in a database that should be tweaked when you use SSDs. For instance, speaking for PostgreSQL you can adjust effective_io_concurrency, and random_page_cost. However, faster reads and faster random access isn't what a database does. It ensures

ACID (Atomicity, Consistency, Isolation, Durability)
Some form of concurrency control, MVCC (Multiversion concurrency control)
Standardized access for libraries (XQuery, or SQL)

He's just wrong about indexes. If the whole table can be read into ram, an index is still useful. Don't believe me? Let's do a thought experiment,

Imagine you have a table with one indexed column.
```
CREATE TABLE foobar ( id text PRIMARY KEY );
```
Imagine that there are 500 million rows in that table.
Imagine all 500 million rows are concatenated together into a file.

What's faster,

grep 'keyword' file
SELECT * FROM foobar WHERE id = 'keyword'

It's not just about where data is at, it's about how you order it and what operations you can do it. PostgreSQL supports B-tree, Hash, GiST, SP-GiST, GIN and BRIN indexes (and Bloom through an extension). You'd be foolish to think that all of that math and functionality goes away because you have faster random access.

Best Answer

Related Solutions

Sql-server – Custom SQL Server Computer Build

How to Prove Removing Foreign Keys Doesn’t Corrupt Data in MySQL

Related Question