MySQL Performance Tuning for Very Large Databases

MySQLperformance

Let's say I have database "DB" and it has many tables.

Question 1.
If table A has 100 million rows of data and table B has 100000 rows, will database size as a whole impact performance if I do a select query on table B?

Question 2.
Table A` has 100 million rows of data for around 8000 products. Searching data for particular product on this table is very slow. Will performance increase if I instead create 8000 tables, 1 for each product and put respective data on respective table and make a search on that particular table? Does this have any caveats in a long run?

More information:

  • It's an InnoDB Table
  • Ram size is 27GB
  • Table A size (the largest table)
    • Rows=100.43 Million
    • Data Size= 13.62GB
    • Index Size= 7.21GB
    • Total Size = 20.83GB
  • Table B size
    • Table B could be various other tables but their size does not exceed 500MB

Table A is indexed and it has three indices on id (Pk), ProductCode, and AddedDate columns.

UPDATE

Table A

CREATE TABLE `ProductDetails` (
  `id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
  `DepartmentID` INT(10) DEFAULT NULL,
  `SellerID` INT(10) DEFAULT NULL,
  `ProductCode` VARCHAR(20) DEFAULT NULL,
  `WholesalerType` VARCHAR(5) DEFAULT NULL,
  `Remarks` TEXT DEFAULT NULL,
  `AddedDate` DATETIME DEFAULT NULL,
  `InsertedTimestamp` INT(10) UNSIGNED NOT NULL,
  `WholesalerID` TINYINT(3) UNSIGNED DEFAULT NULL,
  `ProductType` TINYINT(4) DEFAULT NULL,
  `ArrivalDate` DATETIME DEFAULT NULL,
  `SerialNumber` VARCHAR(4) DEFAULT NULL,
  `MessageID` INT(11) DEFAULT NULL,
  `Model` VARCHAR(10) NOT NULL DEFAULT 'NA'
  PRIMARY KEY (`id`),
  KEY `ProductCode` (`ProductCode`),
  KEY `AddedDate` (`AddedDate`)
) ENGINE=INNODB DEFAULT CHARSET=utf8 COMMENT='Engine should be InnoDB for performance'

Table B

CREATE TABLE `Products` (
  `ID` INT(11) NOT NULL AUTO_INCREMENT,
  `ProductCode` VARCHAR(20) NOT NULL DEFAULT '00A',
  `Created` DATETIME DEFAULT NULL,
  `LastEdited` DATETIME DEFAULT NULL,
  `Model` VARCHAR(15) CHARACTER SET utf8 DEFAULT NULL,
  `ProductType` VARCHAR(15) CHARACTER SET utf8 DEFAULT NULL,
  `StockItem` INT(11) NOT NULL DEFAULT '0',
  `StockThreshold` TINYINT(2) NOT NULL DEFAULT '10',
  `CostPrice` DOUBLE DEFAULT NULL,
  `SellingPrice` DOUBLE DEFAULT NULL,
  `ShortDescription` MEDIUMTEXT CHARACTER SET utf8,
  `Specification` TEXT CHARACTER SET utf8,
  `Description` TEXT CHARACTER SET utf8,
  `BrandID` INT(11) NOT NULL DEFAULT '0',
  `ManufacturerID` INT(11) NOT NULL DEFAULT '0',
  `ProductStatus` TINYINT(1) UNSIGNED NOT NULL DEFAULT '1',
  PRIMARY KEY (`ID`),
  KEY `ProductCode` (`ProductCode`)
) ENGINE=INNODB DEFAULT CHARSET=latin1

Example Query

SELECT * FROM ProductDetails WHERE (ProductCode = 'PR1920A' 
AND AddedDate > FROM_UNIXTIME(1519841700) 
AND AddedDate < FROM_UNIXTIME(1523934207)) 
AND ((ProductType in('14','17') 
AND MessageID = 28 
AND Model LIKE 'MBA-1201') 
OR (WholesalerType in('PARAM1','PARAM2','PARAM3','PARAM4','PARAM5') 
AND UPPER(Model) IN('RAGL21','BVRK10','TPL001','LTV2960','DEN7YZ','6YU90Q','47GER8','GB3049'))) 
AND DepartmentID != 0 AND SellerID != 0 
ORDER BY AddedDate;

UPDATE 2

Here are the result of the EXPLAIN

id   = 1
select_type =  SIMPLE
table = ProductDetails
type = ref
possible_keys = ProductCode, AddedDate
key = ProductCode
key_len = 63
ref = const
rows = 104456
Extra = Using index condition; Using where; Using filesort

And here is the SHOW INDEX results for ProductDetails table

enter image description here

Best Answer

Q1: A "database" is a collection of "tables". The performance of a query on one table (or a few tables JOINed together) is entirely on the table(s) in the query. Not other tables; not other databases.

Q2: Do not make a bunch of tables. If you have an index starting with ProductCode, then queries with WHERE ProductCode = 123 AND ... will be efficient. (This question has been asked and answered many times - always the same way.)

innodb_buffer_pool_size should be set to about 20G.

If you usually fetch by ProductCode, then change

PRIMARY KEY (`id`),
KEY `ProductCode` (`ProductCode`),
KEY `AddedDate` (`AddedDate`)

to

PRIMARY KEY(ProductCode, AddedDate, id),
INDEX(id),
KEY `AddedDate` (`AddedDate`)   -- (no change here)

That will 'cluster' all 12K (avg) rows for a given product together, making many SELECTs run faster.

Does Table B have 8K rows? Or 100K rows? If there is one row per product, then get rid of id and change

PRIMARY KEY (`ID`),
KEY `ProductCode` (`ProductCode`)

to

PRIMARY KEY(ProductCode)

BUG! You have inconsistent CHARACTER SETs between the tables. This can cause comparisons on ProductCode to not use an index! (And hence be sloooow.) (See ALTER TABLE .. CONVERT TO ..)

UPPER(Model) -- Don't use UPPER. You have a case-folding COLLATION, so you will get the desired test without need for UPPER().

When I suggested PRIMARY KEY(ProductCode, AddedDate, id) above, including AddedDate was not important. But looking at the SELECT, it is important, both in the WHERE and in the ORDER BY.