Mysql – database and table design for billions of rows of data

database-designdatabase-recommendationMySQLmysql-5.7

basically i got 2 tables: header, details tables.

CREATE TABLE `header` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 `RECORD_DATE` datetime DEFAULT NULL,
 `TICKER_ID` int(11) DEFAULT NULL,
 `CURR_TIMESTAMP` datetime DEFAULT NULL,
 PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `detail` (
 `ID` int(11) NOT NULL AUTO_INCREMENT, 
 `HEADER_ID` int(11) DEFAULT NULL,
 `BROKER_ID` int(11) DEFAULT NULL,
 `AMOUNT` decimal(26,0) DEFAULT NULL,
 PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

There're about 4800 ticker need to be update everyday, ticker_id (A00001, B00032…) and each ticker_id contain numbers of record everyday and i store in detail table

it works fine at the beginning, after years, header become 2.4 million row and detail table got 250 millions row, its take an hour to with simple select,

SELECT h.ticker_id, h.record_date , d.broker_id, d.broker_id, d.amount
FROM DETAIL 
INNER JOIN herder h 
    ON h.id = d.header_id
where h.ticker = 'A00001'

so i create a 'link' table to join reference them together.

CREATE TABLE `linkA00001` (
 `ID` int(11) NOT NULL AUTO_INCREMENT, 
 `HEADER_ID` int(11) DEFAULT NULL,
 `DETAIL_ID` int(11) DEFAULT NULL, 
 PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

and the result become

SELECT h.ticker_id, h.record_date , d.broker_id, d.broker_id, d.amount
FROM DETAIL
INNER JOIN linkA00001 l
INNER JOIN herder h 
    ON h.id = l.header_id and d.id = l.detail_id

its stunning fast, from hour to few seconds, but this approach solve 1 ticker selection only, so i create 4800 link table for better performance, its do solve the selection problem, but i using java spring for my application, building over 4800 entity require 10 minutes to start the application and very hard to maintain the link table.

so my question is:

  1. this approach works?
  2. if not, how can i improve or just redesign this?
  3. according to this link, its require timestamp but i never insert at detail table, is it good practices add back?

Thank you very much!

Best Answer

To improve the query:

SELECT h.ticker_id, h.record_date , d.broker_id, d.broker_id, d.amount
FROM DETAIL 
INNER JOIN header h 
    ON h.id = d.header_id
WHERE
    h.ticker_id = 'A00001'

(ignoring that ticket_id is an INT in the table);

The aspect of the query that will limit the results most is the value of h.ticket_id. To find this quickly there need to be an index on this in table header.

 CREATE INDEX ticket ON header (TICKER_ID)

The database then need to find the HEADER_ID quickly in the detail table:

 CREATE INDEX header ON detail (HEADER_ID)

What is unclear in the question here is why detail has an auto_increment primary key. If this isn't used in any queries, and if HEADER_ID,BROKER_ID is unique, then perhaps those are a better primary key for the table and as an alternate to CREATE INDEX header above:

ALTER TABLE detail
DROP ID,
DROP PRIMARY KEY,
ADD PRIMARY KEY (HEADER_ID, BROKER_ID);

This will also speed up the query as primary keys are quicker to search than secondary keys (where non-index values like d.amount are also needed).