Mysql – database and table design for billions of rows of data

database-designdatabase-recommendationMySQLmysql-5.7

basically i got 2 tables: header, details tables.

CREATE TABLE `header` (
 `ID` int(11) NOT NULL AUTO_INCREMENT,
 `RECORD_DATE` datetime DEFAULT NULL,
 `TICKER_ID` int(11) DEFAULT NULL,
 `CURR_TIMESTAMP` datetime DEFAULT NULL,
 PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `detail` (
 `ID` int(11) NOT NULL AUTO_INCREMENT, 
 `HEADER_ID` int(11) DEFAULT NULL,
 `BROKER_ID` int(11) DEFAULT NULL,
 `AMOUNT` decimal(26,0) DEFAULT NULL,
 PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

There're about 4800 ticker need to be update everyday, ticker_id (A00001, B00032…) and each ticker_id contain numbers of record everyday and i store in detail table

it works fine at the beginning, after years, header become 2.4 million row and detail table got 250 millions row, its take an hour to with simple select,

SELECT h.ticker_id, h.record_date , d.broker_id, d.broker_id, d.amount
FROM DETAIL 
INNER JOIN herder h 
    ON h.id = d.header_id
where h.ticker = 'A00001'

so i create a 'link' table to join reference them together.

CREATE TABLE `linkA00001` (
 `ID` int(11) NOT NULL AUTO_INCREMENT, 
 `HEADER_ID` int(11) DEFAULT NULL,
 `DETAIL_ID` int(11) DEFAULT NULL, 
 PRIMARY KEY (`ID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

and the result become

SELECT h.ticker_id, h.record_date , d.broker_id, d.broker_id, d.amount
FROM DETAIL
INNER JOIN linkA00001 l
INNER JOIN herder h 
    ON h.id = l.header_id and d.id = l.detail_id

its stunning fast, from hour to few seconds, but this approach solve 1 ticker selection only, so i create 4800 link table for better performance, its do solve the selection problem, but i using java spring for my application, building over 4800 entity require 10 minutes to start the application and very hard to maintain the link table.

so my question is:

this approach works?
if not, how can i improve or just redesign this?
according to this link, its require timestamp but i never insert at detail table, is it good practices add back?

Thank you very much!

Best Answer

To improve the query:

SELECT h.ticker_id, h.record_date , d.broker_id, d.broker_id, d.amount
FROM DETAIL 
INNER JOIN header h 
    ON h.id = d.header_id
WHERE
    h.ticker_id = 'A00001'

(ignoring that ticket_id is an INT in the table);

The aspect of the query that will limit the results most is the value of h.ticket_id. To find this quickly there need to be an index on this in table header.

 CREATE INDEX ticket ON header (TICKER_ID)

The database then need to find the HEADER_ID quickly in the detail table:

 CREATE INDEX header ON detail (HEADER_ID)

What is unclear in the question here is why detail has an auto_increment primary key. If this isn't used in any queries, and if HEADER_ID,BROKER_ID is unique, then perhaps those are a better primary key for the table and as an alternate to CREATE INDEX header above:

ALTER TABLE detail
DROP ID,
DROP PRIMARY KEY,
ADD PRIMARY KEY (HEADER_ID, BROKER_ID);

This will also speed up the query as primary keys are quicker to search than secondary keys (where non-index values like d.amount are also needed).

Related Solutions

Mysql – Are two indexes needed

An index can seek by a subset of characters, as long as you're searching from the left. E.g., "Inter%" can seek, "%net" will not.

However, the first character is not necessarily the character under which the article would be sorted. "The Internet" should go under "I", not "T". You probably need two fields, DisplayTitle and SortTitle; a single-character index on the latter may be worthwhile, but most likely a full-length index will be just fine.

Indexes are typically B-trees, and a seek will jump to the right location about equally quickly whether you have 10 or 100 entries per page. Scans are another matter, but I'd start with the simplest solution and add an extra index only if performance proves inadequate in practice.

Mysql – Finding rows for a specified date range

Another way to get the result is this. It finds first all groups that the teacher has surely taught (or is going to) by checking that she has started within the month and then in another subquery it finds - for every group - the last teacher that started at the first day of the month or earlier.

With the unique index you have on the table, the second subquery should be quite efficient. The first subquery would benefit from an index on (teacherid, startdate, groupid):

SELECT groupid
FROM pupilgroupteacher
WHERE teacherid = @teacher 
  AND startdate >= @month + INTERVAL 1 DAY
  AND startdate < @month + INTERVAL 1 MONTH

UNION DISTINCT

SELECT gg.groupid
FROM 
    ( SELECT DISTINCT groupid
      FROM pupilgroupteacher
    ) AS gd
  JOIN pupilgroupteacher AS gg
    ON  gg.groupid = gd.groupid
    AND gg.startdate =
        ( SELECT MAX(gi.startdate)
          FROM pupilgroupteacher AS gi 
          WHERE gi.groupid = gd.groupid
            AND gi.startdate < @month + INTERVAL 1 DAY
        )
WHERE gg.teacherid = @teacher ;

Best Answer

Related Solutions

Mysql – Are two indexes needed

Mysql – Finding rows for a specified date range

Related Question