Mysql – Speed up large database queries

indexinnodbMySQLpartitioningperformance

I am running a large local database with MySQL and InnoDB. The size is around 30GB for data and 6GB for Keys. The machine has 16GB of RAM.

The table looks like this

symbol VARCHAR(10),
iso_datestamp VARCHAR(14),
-- a lot more columns with values with mostly DECIMAL(26,6)s

I have BTREE indexes on the symbol and the iso_datestamp.

The operations I run are mostly updates which are run like this:

  1. DELETE FROM table WHERE symbol = 'XXX'
  2. INSERT INTO table (columns) VALUES (...)

This needs to be done to avoid data inconsistencies as opposed to just appending data.

I also frequently run analysis as in
SELECT FROM table WHERE symbol in ('XXX', 'YYY', ...) [AND iso_datestamp > '2015...']

Per symbol I have around 3000 entries.

The performance problem: As soon as the table starts to grow over around 10GB, the DELETE and INSERT statements slow down substantially from usually less than 0.4 seconds to well over 1 second even up to 5 seconds and more as the table keeps growing. RAM is not limiting (according to top).

I need the total of DELETE and INSERT to be around 1 second on average (of course faster is also fine)…

I am having 2 ideas right now:

  • Using hash indexes instead of btree indexes, since I am only doing equality operation
  • Partitioning the table using the symbol and the range

Are those viable strategies? Do both, just one? Other things I should do to improve the speed?

Best Answer

  • iso_datestamp to a MySQL TIMESTAMP so that it can be 5 bytes instead of 15.
  • Use InnoDB, not MyISAM. (This is critical to some of the items below.)
  • Use CHARACTER SET ascii (or latin1) unless you need the multibyte utf8/utf8mb4.
  • PRIMARY KEY(symbol, datestamp)
  • An auto_increment is a waste of space and indexing. The above is a 'natural PK' and it provides an efficient index.
  • PARTITION BY RANGE(TO_DAYS(datestamp)) -- this will significantly speed up deletion of 'old' rows.
  • With that PK and partitioning, you may not need any secondary indexes, thereby completely eliminating the 6GB you mentioned. (If you have queries that are not adequately handled, let me know.)
  • Shoot for no more than about 50 partitions; do not bother with subpartitions. (See link below)
  • Use DROP PARTITION for very efficient deleting by date. (See link below) If you are deleting by symbol, let me know.
  • DECIMAL(26,6) takes 12 bytes. Consider shrinking it and/or moving to an 8-byte DOUBLE. Caution: Double's ~16 significant digits and binary format may lead to rounding issues.
  • Consider using a 3-byte MEDIUMINT INT for a symbol_id and normalize the actual symbol name into another (small) table.
  • Metrics (eg, "alpha") should probably be 4-byte FLOATs. It's 7 significant digits is overkill for such.
  • Set innodb_buffer_pool_size to about 70% of available memory (after accounting for apps, web server, etc.)
  • Insert new data in batches of 100-1000. This runs literally 10 times as fast as one row at a time.

These changes should help with your various issues.

Currently, performance is falling off a cliff for multiple reasons. The above tips will delay the cliff for about twice as long, and make the cliff less steep. In some cases, the cliff is eliminated.

Also, my tips will cut back on the I/O needed -- by clustering, by locality of reference, by better caching, etc.

http://mysql.rjweb.org/doc.php/partitionmaint