Mysql – Which Database to choose: MySQL or Oracle

blobcompressionMySQLoraclepostgresql

I have an application generating time series data (32 channels, 22KHz, 6 secs) once a minute. These have to be stored in a database. At the moment I'm using MySQL with InnoDB tables (file_per_table,file_format=Barracuda).

As the data can be compressed, I'm using a medium-blob column with compression enabled. In compressed form, that would generate about 1TB a year. So far, so good.

But is this the optimal selection? I want to use a free database – but of course there are alternatives like Oracle or maybe PostgreSQL.
The target machine running the DB would be Debian stable. I'm not sure either which file system would be best.

I'm open for any suggestions.

Best Answer

Which db to choose? I would choose PostgreSQL having worked with MySQL and PostgreSQL but it is worth noting how different the databases are. I will say I have frequently been impressed (and only rarely disappointed) by what sort of abuse I can throw at PostgreSQL only to watch things be handled gracefully.

In your specific case, however, there may be reasons to look specifically at PostgreSQL. The major issue is programmability and advanced indexing but there are other reasons too (basically in InnoDB your table is contained in the pkey index, while in PostgreSQL you can do a physical-order scan of the table (and this has advantages when pulling a lot of rows). Major disadvantages might include not liking the binary type interfaces (and often saving/retrieving the values can use more memory than I would like). As for filesystem I know a lot of people who are very happy with XFS and also with ZFS.

Finally one specific tradeoff here is the Pg 8k data page size. It sounds like your time series would be larger than this, and so would be TOASTed (moved to extended storage). This speeds up many queries, but it also slows down retrieving the specific time series slightly because extra physical files must be scanned.

As for programmability, basically you could write functions that can look inside a time series for specific useful information and then you can index the output of this function. Consequently this can be used to speed up searches because the function doesn't have to be run.