I have a database table (created by someone else). This table consists over billions of records and records are being inserted every second or so.
I need to optimize this table and hence the queries to fetch stuffs faster. Following is the table ProductCatalog
structure.
id int(10)
SerialNumber varchar(20)
BasePrice decimal(4,1)
BatchCode tinyint(3)
Type varchar(5)
ItemCode varchar(5)
ArrivalDate datetime
InsertTimestamp int(10)
BrandID tinyint(3)
CompanyID tinyint(4)
Model varchar(10)
Description text
Here is the script to create table
CREATE TABLE `ProductCatalog` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`SerialNumber` varchar(20) DEFAULT NULL,
`BasePrice` decimal(4,1) DEFAULT NULL,
`BatchCode` tinyint(3) unsigned DEFAULT NULL,
`Type` varchar(5) DEFAULT NULL,
`ItemCode` varchar(5) DEFAULT NULL,
`ArrivalDate` datetime DEFAULT NULL,
`InsertTimestamp` int(10) unsigned NOT NULL,
`BrandID` tinyint(3) unsigned DEFAULT NULL,
`Model` varchar(10) NOT NULL DEFAULT 'RX209',
`Description` text,
PRIMARY KEY (`id`),
KEY `SerialNumber` (`SerialNumber`,`ArrivalDate`,`ItemCode`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Here are the sample insert queries.
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('1','345618761','23','25','ACC','ABCD','2021-02-14 12:24:44','1613371259','37','RX209',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('2','345618761','24','25','RAF','ABCD','2021-02-15 12:55:17','1613373031','45','GA317',NULL);
insert into `productcatalog` (`id`, `SerialNumber`, `BasePrice`, `BatchCode`, `Type`, `ItemCode`, `ArrivalDate`, `InsertTimestamp`, `BrandID`, `Model`, `Description`) values('3','569014575','43','21','DAT','TPRS','2021-02-13 12:56:34','1613373082','34','PX452',NULL);
There could be many many entries for same SerialNumber
and ItemCode
in Different ArrivalDate
Initially there were three indexes
1. id => Primary
2. SerialNumber
3. ArrivalDate
Following are the queries i run against this table.
SELECT *
FROM ProductCatalog
WHERE SerialNumber='1234567890'
AND ItemCode!="ABCD"
ORDER BY id DESC LIMIT 1; //This Query Seems slower
SELECT BasePrice
FROM ProductCatalog
WHERE SerialNumber='123456789'
AND ItemCode!="ABCD"
and ItemCode!="PQRS"
AND ItemCode!="MNOP"
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower
SELECT *
FROM ProductCatalog
WHERE SerialNumber='123456789'
AND (ArrivalDate>='2019-01-01 00:00:00' AND ArrivalDate<='2020-12-31 23:59:59')
AND ItemCode='ABCD'
ORDER BY ArrivalDate ASC //This query looks ok
SELECT BatchCode
FROM ProductCatalog
WHERE SerialNumber='123456789'
AND ItemCode!="ABCD"
and ItemCode!="PQRS"
AND ItemCode!="MNOP"
ORDER BY ID Desc LIMIT 1 //This Query Seems Slower
Then I changed the indexes such that we have only two indexes now. The primary and the composite one.
1. id => Primary
2. SerialNumber, ArrivalDate, ItemCode
Mysql Information
MySQL Version 5.7
Engine: InnoDB
Problem
The results are still not that satisfactory.
- Are the indexes I changed correct to get performance gain?
- Are the order of
columns
in index Correct? - Column
SerialNumber
contains 16 digit numeric value, shall i changed it toint
instead ofvarchar
to gain performance?
Best Answer
"billions of records" and
id int(10) unsigned NOT NULL AUTO_INCREMENT
-- Beware! That will top out at about 4 billion.To speed up the inserts, see if you can batch them something liek this:
For the
SELECTs
, then indexes (with the columns in the order specified) should help:Query 1 will (I think) use
SerialNumber, ID
from the first index and stop after one or two rows.Query 2 is "covering", so even it needs to scan several rows, it would have to bounce between the index's BTree and the data's BTree.
Query 3 -- The
=
columns need to be first and the "range" column (Arrival Date
) last.The following changes won't improve performance; I just think they are clearer. Instead of
I prefer these:
The following will explain some of my suggestions: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
This seems odd for a "price":
decimal(4,1)