Mysql – Slow “Copying to tmp table” profile step

mariadbMySQLperformanceperformance-tuning

I have a MariaDB (MySQL) query that uses a DISTINCT to get my results. The query takes about 6 seconds to complete.

Is there any way to improve the speed of the `"Copying to tmp table" step? This is where the majority of the running time comes from.

A few things to note:

  • The temp table is created in-memory (not on disk)
  • I'd like to see if this can be improved without re-writing the query
  • fact_warehouse_inventory_change has ~48K rows
  • dim_date has ~9K rows starting from 01-01-1995 through 2020-12-31

QUERY

SELECT SQL_NO_CACHE DISTINCT fact_warehouse_inventory_change.warehouse_location_id
FROM fact_warehouse_inventory_change
INNER JOIN dim_date ON
    dim_date.date >= fact_warehouse_inventory_change.row_effective_date AND
    dim_date.date <= fact_warehouse_inventory_change.row_expiration_date

EXPLAIN

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: dim_date
         type: index
possible_keys: ux_date_date_id,date,date_desc
          key: date
      key_len: 3
          ref: NULL
         rows: 9498
     filtered: 100.00
        Extra: Using index; Using temporary
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: fact_warehouse_inventory_change
         type: ALL
possible_keys: row_effective_date,row_effective_date_row_expiration_date,row_expiration_date_row_effective_date_warehouse_location_id
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 48124
     filtered: 100.00
        Extra: Range checked for each record (index map: 0xC8)

TABLES

-- Abbreviated table definition
CREATE TABLE `dim_date` (
    `date_id` INT(11) NOT NULL,
    `date` DATE NOT NULL,
    `year` SMALLINT(6) NULL DEFAULT NULL,
    `year_for_week` SMALLINT(6) NULL DEFAULT NULL,
    `year_for_week_iso8601` SMALLINT(6) NULL DEFAULT NULL,
    `quarter` TINYINT(4) NULL DEFAULT NULL,
    `month` TINYINT(4) NULL DEFAULT NULL,
    `week_of_year` TINYINT(4) NULL DEFAULT NULL,
    `week_of_month` TINYINT(4) NULL DEFAULT NULL,
    `week_of_year_iso8601` TINYINT(4) NULL DEFAULT NULL,
    `week_of_month_iso8601` TINYINT(4) NULL DEFAULT NULL,
    `weekday_in_month` TINYINT(4) NULL DEFAULT NULL
    PRIMARY KEY (`date_id`, `date`),
    UNIQUE INDEX `ux_date_date_id` (`date`, `date_id`),
    UNIQUE INDEX `date` (`date`),
    UNIQUE INDEX `date_desc` (`date`)
)
COLLATE='utf8_general_ci'
/*!50100 PARTITION BY RANGE (year(date))
(PARTITION part0 VALUES LESS THAN (2010) ENGINE = InnoDB,
 PARTITION part1 VALUES LESS THAN (2011) ENGINE = InnoDB,
 PARTITION part2 VALUES LESS THAN (2012) ENGINE = InnoDB,
 PARTITION part3 VALUES LESS THAN (2013) ENGINE = InnoDB,
 PARTITION part4 VALUES LESS THAN (2014) ENGINE = InnoDB,
 PARTITION part5 VALUES LESS THAN (2015) ENGINE = InnoDB,
 PARTITION part6 VALUES LESS THAN (2016) ENGINE = InnoDB,
 PARTITION part7 VALUES LESS THAN (2017) ENGINE = InnoDB,
 PARTITION part8 VALUES LESS THAN (2018) ENGINE = InnoDB,
 PARTITION part9 VALUES LESS THAN (2019) ENGINE = InnoDB,
 PARTITION part10 VALUES LESS THAN MAXVALUE ENGINE = InnoDB)  */;

CREATE TABLE `fact_warehouse_inventory_change` (
    `id` INT(11) NOT NULL AUTO_INCREMENT,
    `warehouse_allocation_key` INT(11) NULL DEFAULT NULL,
    `date_id` INT(11) NOT NULL,
    `warehouse_item_id` INT(11) NOT NULL,
    `warehouse_location_id` INT(11) NOT NULL,
    `inventory_qty` INT(11) NULL DEFAULT NULL,
    `reorder_qty` INT(11) NULL DEFAULT NULL,
    `minimum_on_hand_qty` INT(11) NULL DEFAULT NULL,
    `individual_price` DECIMAL(10,2) NULL DEFAULT NULL,
    `unit_of_measure_price` DECIMAL(10,2) NULL DEFAULT NULL,
    `row_effective_date` DATE NOT NULL,
    `row_expiration_date` DATE NOT NULL,
    PRIMARY KEY (`id`),
    INDEX `date_id` (`date_id`),
    INDEX `warehouse_item_id` (`warehouse_item_id`),
    INDEX `row_effective_date` (`row_effective_date`),
    INDEX `warehouse_allocation_key` (`warehouse_allocation_key`),
    INDEX `warehouse_location_id` (`warehouse_location_id`),
    INDEX `row_effective_date_row_expiration_date` (`row_effective_date`, `row_expiration_date`),
    INDEX `row_expiration_date_row_effective_date_warehouse_location_id` (`row_expiration_date`, `row_effective_date`, `warehouse_location_id`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
AUTO_INCREMENT=47776
;

CONFIG

tmp_table_size                  = 256M
max_heap_table_size             = 256M
innodb_buffer_pool_size         = 4G
innodb_log_buffer_size          = 8M
innodb_log_files_in_group       = 2
innodb_file_per_table           = 1
innodb_open_files               = 400
innodb_io_capacity              = 400

Best Answer

  • The query is missing (after line 3) dim_date.date_id = ...date WHERE. Adding that will speed it up some. The following items assume that is present and critique the schema and query further...
  • That query is an example of why dates should not be normalized.
  • Do not redundantly specify UNIQUEness -- PK(a,b) + UNIQUE(b,a) -- change UNIQUE to INDEX.
  • There is no "DESC" index, so that second UNIQUE(date) is redundant. (8.0 will fix that deficiency; meanwhile most queries can run backwards efficiently.)
  • There was no advantage, and some performance disadvantages, in PARTITIONing dim_date.
  • There is a YEAR datatype.
  • Once you have thrown away dim_date completely, learn the functions needed for finding MONTH(), etc. And replace date_id with just date.
  • Well, OK, there is a use for dim_date -- whenever you need to report on the data and include dates for which there is no data. Still, date_id is useless, and date should be the PK.
  • fact_warehouse_inventory_change currently has under 50K rows; will it grow to millions? If so, we need to talk about creating and maintaining "Summary tables".
  • There at least 2 redundant indexes in fact_warehouse_inventory_change -- INDEX(a) is redundant when you also have INDEX(a,b).
  • Even after un-normalizing date, the query will still be a full table scan. This is because no index is useful against date >= eff AND date <= exp. What percentage of rows will pass that range test? If it is only a small fraction, then consider removing the failing rows from the table.
  • Building indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql
  • DW: http://mysql.rjweb.org/doc.php/datawarehouse
  • Summary tables: http://mysql.rjweb.org/doc.php/summarytables
  • How much RAM do you have? tmp_table_size should be no more than about 1% of RAM.innodb_buffer_pool_size` should be about 70% of available RAM.