Mysql Performance issue with millions rows and group by

MySQL

I have three table with more than 1 millions of rows and one of my query take more than 30 seconds with a date range of 3 months. I have tried multiples index and I think I have respect index orders. I don't know what to do go get a better performance.

SELECT
MAX(table2.id) as item_id,
MAX(table2.sampled_inspected) as sample_size,
table1.group_col AS ass_group_id

FROM

table1
INNER JOIN table2  ON table1.id = table2.assignment_id
LEFT OUTER JOIN table3  ON table1.po_id = table3.id
LEFT JOIN table4  ON table4.id = table3.supplier_id
LEFT JOIN table5  ON table5.id = table3.project_id

WHERE table1.recycled=0
    AND table2.inspection_status_id > 0
    AND table2.inspection_status_id != 50
    AND table3.importer_id = 215
    AND table5.recycled = 0
    AND table3.project_id IN ( '2062','2063','2064','2065' )
    AND table2.inspection_completed_date >= DATE('2019-10-01')
    AND table2.inspection_completed_date <= DATE('2020-01-29')
GROUP BY ass_group_id

and here my 5 tables

CREATE TABLE `table1` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `recycled` int(1) NOT NULL DEFAULT '0',
  `group_col` varchar(25) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `po_id` int(11) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `index_assignment_po` (`po_id`),
  KEY `recycled` (`recycled`,`group_col`),
  KEY `group_col` (`group_col`)
) ENGINE=InnoDB AUTO_INCREMENT=3324705 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;



CREATE TABLE `table2` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `assignment_id` int(10) unsigned DEFAULT NULL,
  `sampling_size` int(11) DEFAULT NULL,
  `notes` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `photo_report` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `inspection_status_id` int(11) DEFAULT '0',
  `inspection_result_id` int(1) NOT NULL DEFAULT '0',
  `inspection_completed_date` datetime DEFAULT NULL,
  `recycled` int(1) NOT NULL DEFAULT '0',
  `qty_to_inspect` decimal(11,2) DEFAULT NULL,
  `defect_rate` decimal(16,13) DEFAULT NULL,
  `sampled_inspected` int(11) unsigned DEFAULT NULL,
  `inspection_report_id` int(11) unsigned DEFAULT NULL,
  `poline_id` int(11) unsigned DEFAULT NULL,
  `item_status_id` int(11) NOT NULL DEFAULT '0',
  `production_line` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `FK_wsxix_assignments` (`assignment_id`),
  KEY `index_inspection_report_reportid` (`inspection_report_id`),
  KEY `index_ai_inspection_completed_date` (`inspection_completed_date`),
  KEY `index_item_poline` (`poline_id`),
  KEY `index_item_poline_assignment` (`poline_id`,`assignment_id`),
  KEY `wsxix_assignments_items_inspection_status_id` (`inspection_status_id`),
  KEY `wsxix_assignments_items_inspection_result_id` (`inspection_result_id`),
  KEY `inspection_completed_date` (`inspection_completed_date`,`inspection_result_id`,`inspection_status_id`)

) ENGINE=InnoDB AUTO_INCREMENT=3324705 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


CREATE TABLE `table3` (
                          `id` int(10) unsigned NOT NULL AUTO_INCREMENT,

                          `buyer_id` int(11) unsigned DEFAULT NULL,
                          `client_id` int(11) unsigned DEFAULT NULL,
                          `merchandiser_user_id` int(11) unsigned DEFAULT NULL,
                          `factory_id` int(11) unsigned DEFAULT NULL,
                          `dc_id` int(11) unsigned DEFAULT NULL,
                          `parent_id` int(10) unsigned DEFAULT NULL,
                          `supplier_id` int(11) unsigned NOT NULL,
                          `importer_id` int(11) unsigned NOT NULL,
                          `project_id` int(11) unsigned NOT NULL,
                          `tier_two_id` int(11) unsigned DEFAULT NULL,
                          `store_id` int(11) unsigned DEFAULT NULL,
                          PRIMARY KEY (`id`),
                          KEY `buyer_id` (`buyer_id`),
                          KEY `client_id` (`client_id`),
                          KEY `merchandiser_user_id` (`merchandiser_user_id`),
                          KEY `FK_po_factory_companies_id` (`factory_id`),
                          KEY `FK_po_dc_companies_id` (`dc_id`),
                          KEY `index_po_parentid` (`parent_id`),
                          KEY `index_purchaseorders_cie` (`supplier_id`),
                          KEY `index_poimporter_cie` (`importer_id`),
                          KEY `index_purchaseorders_project` (`project_id`),
                          KEY `facory_importer_id` (`factory_id`,`importer_id`),
                          KEY `posu` (`project_id`,`supplier_id`),
                          KEY `store_id` (`store_id`),
                          KEY `tier_two_id` (`tier_two_id`)

) ENGINE=InnoDB AUTO_INCREMENT=34959 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `table4` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `parent_company_id` int(11) DEFAULT NULL,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,

  PRIMARY KEY (`id`),
  KEY `parent_company_id` (`parent_company_id`)
) ENGINE=InnoDB AUTO_INCREMENT=272 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

CREATE TABLE `table5` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
  `recycled` int(1) NOT NULL DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `recycled` (`recycled`)
) ENGINE=InnoDB AUTO_INCREMENT=2066 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

UPDATE:
Made change from @Rick Martin comments

Here my Explain result avec your change
enter image description here

My table size:

Table1: 3.3 Millions

Table2: 3.5 Millions

Table3: 375

Table4 ( Not used anymore and removed in my query): 57

Table5: 5

@Willem Renzema

enter image description here

Best Answer

In addition to changes you already made based on other answers and comments, I would add the following 2 indexes:

ALTER TABLE table1
ADD INDEX dbawr1 (recycled,group_col,po_id)

ALTER TABLE table2
ADD INDEX dbawr2 (assignment_id,inspection_completed_date,sampled_inspected)

Trying to get a MAX value for two different columns at the same time, from the same table, is unfortunately going to be something that is more difficult to optimize.

With these indexes it should hopefully try to do table1 first, and so potentially avoid the filesort, which is likely much of the performance problem.

Provide the EXPLAIN output after these changes, and how quickly the query itself runs. We'll see if this made progress before I suggest more changes.

Forcing table1 to be first

Well, it appears that the query optimizer is refusing to try running the query with table1 being the first table. I believe having table1 go first may be more efficient, so here's your query with it modified to accomplish that goal. Basically, I replaced the LEFT JOINs with STRAIGHT_JOINs.

SELECT
MAX(table2.id) as item_id,
MAX(table2.sampled_inspected) as sample_size,
table1.group_col AS ass_group_id

FROM

table1
STRAIGHT_JOIN table2  ON table1.id = table2.assignment_id
STRAIGHT_JOIN table3  ON table1.po_id = table3.id
STRAIGHT_JOIN table5  ON table5.id = table3.project_id

WHERE table1.recycled=0
    AND table2.inspection_status_id > 0
    AND table2.inspection_status_id != 50
    AND table3.importer_id = 215
    AND table5.recycled = 0
    AND table3.project_id IN ( '2062','2063','2064','2065' )
    AND table2.inspection_completed_date >= DATE('2019-10-01')
    AND table2.inspection_completed_date <= DATE('2020-01-29')
GROUP BY ass_group_id

This query is equivalent logically because all of your LEFT JOINs were already being converted into INNER JOINs already, because you have WHERE clauses that reference those tables. A STRAIGHT_JOIN is the same as an INNER JOIN, but just forces the tables to be joined in specific order.

Run that and let me know how it performs. Also, try this query like you did before, by doing just one of the MAX statements at a time. See if either of those are better.

Last of the ideas

The only other thing I can think of to help, and this may be why the optimizer doesn't want to do table1 first, is that the column you are grouping by is 100 bytes wide.

Is varchar(25) with utf8mb4 charset required? That charset is a good default, but it is 4 bytes wide, and if you are only holding letters and numbers you can reduce that by a factor of 4 by switching to something like the latin1 charset, which is only 1 byte wide. Also, if it was possible to switch that column to be a INT that would be even better, as that is 4 bytes, which is up to 25 times smaller than what you currently have. I don't expect that is practical though.

Warning: Be careful about changing the charset on your production data, as if the charset does matter you'll incur data loss upon conversion.

Either way, definitely try my modified query before you bother with this column.