I'm creating indexes on a few tables I've recently created within MySQL.
The first (call below) completed within ~20 minutes and was applied to a table with a little over 20M records:
mysql> alter table fu.myTable add index(date), add index(id), add index(symbol), add index(method), algorithm=inplace, lock=none;
Query OK, 0 rows affected (20 min 20.52 sec)
Records: 0 Duplicates: 0 Warnings: 0
The second, applied to a much larger table consisting of nearly 1.1B rows, is taking considerably longer (~19 hours in at the moment). Sorting is obviously an O(n*log(n)) operation, so for a table 50x the size, by my calcs, processing on the larger table would be expected to take ~22 hours. Call I'm using for it below:
mysql> alter table op.myOtherTable add index(dataDate), add index(ticker), add index(symbol), add index(expDate), algorithm=inplace, lock=none;
With that, I wonder if there are any settings I may be able to adjust to speed things up some given my machine doesn't appear to be overly taxed at this point.
Edit:
The dataset consists of ~20 years of historical data and is parsed into daily CSVs (of ~+/-1M rows) to make it a little easier to work with. I've successfully gotten the data into MySQL but now need to apply indexes to make it usable. Fields I will likely search on are those I'm indexing on (dataDate, ticker, symbol, expDate). Additional detail below:
mysql> select table_rows from information_schema.tables where table_name = 'options';
+------------+
| table_rows |
+------------+
| 1097863845 |
+------------+
1 row in set (2.70 sec)
mysql> show create table op.options;
+---------+------------------------------------------------------------+
| Table | Create Table |
+---------+------------------------------------------------------------+
| options | CREATE TABLE `options` (
`dataDate` date NOT NULL,
`ticker` varchar(8) NOT NULL,
`symbol` varchar(30) NOT NULL,
`expDate` date NOT NULL,
`type` char(4) NOT NULL,
`price` double DEFAULT NULL,
`strike` double DEFAULT NULL,
`last` double DEFAULT NULL,
`bid` double DEFAULT NULL,
`ask` double DEFAULT NULL,
`volume` int(11) DEFAULT NULL,
`OI` int(11) DEFAULT NULL,
`IV` double DEFAULT NULL,
`delta` double DEFAULT NULL,
`gamma` double DEFAULT NULL,
`theta` double DEFAULT NULL,
`vega` double DEFAULT NULL,
`aka` varchar(30) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+----------------------------------------------------------------------+
1 row in set (0.01 sec)
Problem table layout sample:
mysql> select * from op.options limit 10;
+------------+--------+------------------+------------+------+-------+--------+------+------+------+--------+------+--------+---------+--------+---------+--------+------+
| dataDate | ticker | optionSymbol | expDate | type | price | strike | last | bid | ask | volume | OI | IV | delta | gamma | theta | vega | aka |
+------------+--------+------------------+------------+------+-------+--------+------+------+------+--------+------+--------+---------+--------+---------+--------+------+
| 2003-02-26 | A | A030322C00010000 | 2003-03-22 | call | 12.8 | 10 | 3.7 | 2.8 || 3.1 | 0 | 66 | 0.8389 | 0.8987 | 0.0653 | -3.8774 | 0.5735 | ACB
| 2003-02-26 | A | A030322P00010000 | 2003-03-22 | put | 12.8 | 10 | 0.15 | 0. || 0.2 | 0 | 1436 | 0.8448 | -0.1027 | 0.0654 | -3.8059 | 0.5789 | AOB
| 2003-02-26 | A | A030322C00012500 | 2003-03-22 | call | 12.8 | 12.5 | 1.1 | 0.9 || 1.05 | 0 | 6274 | 0.6549 | 0.5914 | 0.1833 | -6.5268 | 1.2571 | ACV
| 2003-02-26 | A | A030322P00012500 | 2003-03-22 | put | 12.8 | 12.5 | 0.6 | 0.6 || 0.75 | 75 | 929 | 0.6644 | -0.4089 | 0.1807 | -6.4545 | 1.2573 | AOV
| 2003-02-26 | A | A030322C00015000 | 2003-03-22 | call | 12.8 | 15 | 0.15 | 0. || 0.2 | 45 | 903 | 0.5812 | 0.1586 | 0.1286 | -3.5849 | 0.7831 | ACC
| 2003-02-26 | A | A030322P00015000 | 2003-03-22 | put | 12.8 | 15 | 2.15 | 2.2 || 2.4 | 0 | 107 | 0.558 | -0.8526 | 0.1276 | -3.0784 | 0.7459 | AOC
| 2003-02-26 | A | A030322C00017500 | 2003-03-22 | call | 12.8 | 17.5 | 0.05 | || 0.1 | 0 | 439 | 0.7012 | 0.0474 | 0.0435 | -1.7616 | 0.3198 | ACW
| 2003-02-26 | A | A030322P00017500 | 2003-03-22 | put | 12.8 | 17.5 | 5 | 4. || 4.8 | 0 | 215 | 0.558 | -0.9838 | 0.0224 | -0.3395 | 0.1307 | AOW
| 2003-02-26 | A | A030322C00020000 | 2003-03-22 | call | 12.8 | 20 | 0.05 | || 0.05 | 0 | 224 | 0.8113 | 0.0193 | 0.0179 | -0.9668 | 0.1519 | ACD
| 2003-02-26 | A | A030322P00020000 | 2003-03-22 | put | 12.8 | 20 | 7.1 | 7. || 7.3 | 0 | 6 | 0.558 | -0.999 | 0.0019 | 0.2187 | 0.011 | AOD
+------------+--------+------------------+------------+------+-------+--------+------+------+------+--------+------+--------+---------+--------+---------+--------+------+
10 rows in set (0.44 sec)
Best Answer
Then...
Leave the indexes in place. The table is growing by only 0.1% each day. It would be faster to incrementally update the indexes than to drop and rebuild them.
innodb_buffer_pool_size
at a suitable size. (How much RAM do you have. The simple default of 70% of RAM may be close to optimal.) (The Change buffer is in the buffer_pool.)Schema
For a billion-row table, space is important...
symbol
andticker
? Do you need both? Consider normalizing the pair.DOUBLE
(8 bytes, 16 significant digits) is probably overkill forgamma
, etc. ConsiderFLOAT
(4 bytes, 7 significant digits)volume
can exceed a 32-bitINT
(2B limit) or evenINT UNSIGNED
(4B limit)id
. If it is anAUTO_INCREMENT
you add later, then I suggest you are risking overflow if it isINT
.PRIMARY KEY
would be(date, symbol_id)
or possibly the opposite order.SELECTs
do you have? If you are fetching "all info for one ticker",(symbol_id, date)
is optimal. If you are talking about "today's data" then the opposite is better.type
could be anENUM
(1 byte)INDEX(a), INDEX(b)
is not the same asINDEX(a,b)
. The latter is significantly better in some situations. For querying a billion rows, you need to understand the differences. http://mysql.rjweb.org/doc.php/index_cookbook_mysql