Mysql – Improving MySQL query performance for a large database

join;MySQLperformancequery-performance

have a simple query that do a join between two table. The main table has a date field that my query filter by a range of dates. But for a month period my main table has 37412 datas and for this values has others 914 794 items in child table. That just for a month period, for all time the child table has more than 35 975 568 items.

A simple join, like that, filtering by a range date is so slow:

select  count(i.BrandId) from projectitem i  
inner join project p on i.ProjectId = p.Id 
where (p.Date between "2019-07-01 00:00:00" AND "2019-07-30 23:59:59");

query explain

That is the struct of tables:

    'CREATE TABLE `project` (
      `Id` char(36) NOT NULL,
      `Url` varchar(10) DEFAULT NULL,
      `Region` varchar(2) NOT NULL,
      `Area` int(11) NOT NULL,
      `Name` varchar(250) DEFAULT NULL,
      `Description` text,
      `AccountId` char(36) DEFAULT NULL,
      `UserId` char(36) DEFAULT NULL,
      `Date` datetime DEFAULT NULL,
      `ModifiedDate` datetime DEFAULT NULL,
      `DeletedDate` datetime DEFAULT NULL,
      `Deleted` tinyint(4) NOT NULL,
      `Likes` int(11) NOT NULL,
      `Views` int(11) NOT NULL,
      `Private` tinyint(4) NOT NULL,
      `OnlyBudget` tinyint(4) NOT NULL,
      PRIMARY KEY (`Id`),
      KEY `dateproject` (`Date`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci'


'CREATE TABLE `projectitem` (
  `Id` char(36) NOT NULL,
  `BrandId` char(36) DEFAULT NULL,
  `SpecificationId` char(36) NOT NULL,
  `ProjectId` char(36) NOT NULL,
  PRIMARY KEY (`Id`,`ProjectId`,`SpecificationId`),
  KEY `project_item_key_idx` (`ProjectId`),
  KEY `brand_idx` (`BrandId`),
  KEY `spec_item_key_idx` (`SpecificationId`),
  CONSTRAINT `project_item_key` FOREIGN KEY (`ProjectId`) REFERENCES `project` (`Id`),
  CONSTRAINT `spec_item_key` FOREIGN KEY (`SpecificationId`) REFERENCES `specification` (`Id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci'

How I can improve that? I thing that one reason for the bad performance is because the lenght of my key in "projectitem" table.

I migrated that data from another database. Than, first a created all tables without keys and after the migration I used that command for create keys and indexes:

 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0;
SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0;
SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='TRADITIONAL,ALLOW_INVALID_DATES';

ALTER TABLE `projectitemreference` 
ADD PRIMARY KEY (`Id`, `ProjectId`, `ProjectItemId`, `SpecificationId`),
ADD INDEX `reference_item_key_idx` (`ProjectId` ASC, `ProjectItemId` ASC, `SpecificationId` ASC) VISIBLE
, LOCK = NONE;

ALTER TABLE `projectitemreference` 
ADD CONSTRAINT `reference_item_key`
  FOREIGN KEY (`ProjectId` , `ProjectItemId` , `SpecificationId`)
  REFERENCES .`projectitem` (`ProjectId` , `Id` , `SpecificationId`)
  ON DELETE NO ACTION
  ON UPDATE NO ACTION,
ADD CONSTRAINT `reference_project_key`
  FOREIGN KEY (`ProjectId`)
  REFERENCES .`project` (`Id`)
  ON DELETE NO ACTION
  ON UPDATE NO ACTION, LOCK = NONE;

SET SQL_MODE=@OLD_SQL_MODE;
SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS;
SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS;

Can it be a problem?

Best Answer

count(i.BrandId) checks BrandId for being not null before counting. If you don't need that check, change to COUNT(*) to count the number of rows in the JOIN. Or perhaps you wanted COUNT(DISTINCT i.BrandId) to count the number of different brands?

Then these indexes will help:

project:  INDEX(Date)
projectitem:  INDEX(projectid)           -- if using COUNT(*)
projectitem:  INDEX(projectid, BrandId)  -- if explicitly COUNTing on BrandId

When you dataset gets bigger than RAM, the GUIDs will kill performance due to the random accesses blowing out the cache.

Be aware that any change, even a minor change, to those queries may necessitate reevaluation of the indexes.