Join with certain tables causes execution time to skyrocket

drupalMySQLmysql-8.0performance-testingquery-performance

Database Application: Ver 8.0.30 for Linux on x86_64 (MySQL Community Server – GPL)

Server: Ubuntu 18.04.6

RAM: 32GB

CPUs: 8 core

The underlying application framework is Drupal and it uses a query builder UI (Views module) to generate queries for reports. Please find the non performant query below. Without the join to the flagging table the query executes under few seconds. I have improved the query based on suggestions by @Rick James and @mustaccio. The query time still exceeds 4 minutes when joined with flagging table.

EXPLAIN SELECT 1 AS `expression`
FROM
`node_field_data` `node_field_data`
LEFT JOIN `flagging` `flagging_node_field_data` ON node_field_data.nid = flagging_node_field_data.entity_id AND flagging_node_field_data.flag_id = 'verify_blood_group'
LEFT JOIN `node__field_date_of_collection` `node__field_date_of_collection` ON node_field_data.nid = node__field_date_of_collection.entity_id AND node__field_date_of_collection.deleted = '0'
LEFT JOIN `node__og_audience` `node__og_audience` ON node_field_data.nid = node__og_audience.entity_id AND (node__og_audience.deleted = '0' AND node__og_audience.langcode = node_field_data.langcode)
WHERE ((node__og_audience.og_audience_target_id IN('30', '229', '5026', '60887', '198081', '350754', '519498', '519499', '566913', '568976', '571016', '642633', '739096', '769874', '770003', '800588', '1051756', '1056092', '1101838', '1465616', '1730929', '2045068', '2269366', '3535017', '1836317', '3387310', '9900000'))) AND ((`node_field_data`.`status` = '1') AND (`node_field_data`.`type` IN ('donation_record')) AND (node__field_date_of_collection.field_date_of_collection_value BETWEEN '2022-08-27' AND ('2022-09-02' + INTERVAL 1 DAY)));

Please see the Query Explain below.

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: node__field_date_of_collection
   partitions: NULL
         type: range
possible_keys: PRIMARY,field_date_of_collection_value
          key: field_date_of_collection_value
      key_len: 82
          ref: NULL
         rows: 22808
     filtered: 10.00
        Extra: Using where; Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: node_field_data
   partitions: NULL
         type: ref
possible_keys: PRIMARY,node__id__default_langcode__langcode,node_field__type__target_id,node__status_type
          key: PRIMARY
      key_len: 4
          ref: ebloodbanking8.node__field_date_of_collection.entity_id
         rows: 1
     filtered: 5.00
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: node__og_audience
   partitions: NULL
         type: ref
possible_keys: PRIMARY,og_audience_target_id,og_audience_entityid_deleted_langcode_value
          key: PRIMARY
      key_len: 5
          ref: ebloodbanking8.node__field_date_of_collection.entity_id,const
         rows: 1
     filtered: 10.00
        Extra: Using where
*************************** 4. row ***************************
           id: 1
  select_type: SIMPLE
        table: flagging_node_field_data
   partitions: NULL
         type: ref
possible_keys: flagging_fid_etid,flagging_fid_uid_etid
          key: flagging_fid_etid
      key_len: 34
          ref: const
         rows: 388428
     filtered: 100.00
        Extra: Using where; Using index

Please find the flagging table describe:

| flagging | CREATE TABLE `flagging` (
  `id` int unsigned NOT NULL AUTO_INCREMENT,
  `flag_id` varchar(32) CHARACTER SET ascii NOT NULL COMMENT 'The ID of the target entity.',
  `uuid` varchar(128) CHARACTER SET ascii NOT NULL,
  `entity_type` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `entity_id` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `global` tinyint DEFAULT NULL,
  `uid` int unsigned NOT NULL COMMENT 'The ID of the target entity.',
  `session_id` varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,
  `created` int DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `flagging_fid_etid` (`flag_id`,`entity_id`),
  KEY `flagging_fid_uid_etid` (`flag_id`,`uid`,`entity_id`),
  KEY `flagging_type_fid_etid` (`entity_type`,`flag_id`,`entity_id`),
  KEY `flagging_type_fid_uid_etid` (`entity_type`,`flag_id`,`uid`,`entity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2135664 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='The base table for flagging entities.' |

Show create for the table node__field_date_of_collection

| node__field_date_of_collection | CREATE TABLE `node__field_date_of_collection` (
  `bundle` varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int unsigned NOT NULL COMMENT 'The entity revision id this data is attached to',
  `langcode` varchar(32) CHARACTER SET ascii NOT NULL DEFAULT '' COMMENT 'The language code for this data item.',
  `delta` int unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `field_date_of_collection_value` varchar(20) COLLATE utf8mb4_general_ci NOT NULL COMMENT 'The date value.',
  PRIMARY KEY (`entity_id`,`deleted`,`delta`,`langcode`),
  KEY `bundle` (`bundle`),
  KEY `revision_id` (`revision_id`),
  KEY `field_date_of_collection_value` (`field_date_of_collection_value`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='Data storage for node field field_date_of_collection.'

Show create for the table node__og_audience

| node__og_audience | CREATE TABLE `node__og_audience` (
  `bundle` varchar(128) CHARACTER SET ascii NOT NULL DEFAULT '' COMMENT 'The field instance bundle to which this row belongs, used when deleting a field instance',
  `deleted` tinyint NOT NULL DEFAULT '0' COMMENT 'A boolean indicating whether this data item has been deleted',
  `entity_id` int unsigned NOT NULL COMMENT 'The entity id this data is attached to',
  `revision_id` int unsigned NOT NULL COMMENT 'The entity revision id this data is attached to',
  `langcode` varchar(32) CHARACTER SET ascii NOT NULL DEFAULT '' COMMENT 'The language code for this data item.',
  `delta` int unsigned NOT NULL COMMENT 'The sequence number for this data item, used for multi-value fields',
  `og_audience_target_id` int unsigned NOT NULL COMMENT 'The ID of the target entity.',
  PRIMARY KEY (`entity_id`,`deleted`,`delta`,`langcode`),
  KEY `bundle` (`bundle`),
  KEY `revision_id` (`revision_id`),
  KEY `og_audience_target_id` (`og_audience_target_id`),
  KEY `og_audience_entityid_deleted_langcode_value` (`entity_id`,`deleted`,`langcode`,`og_audience_target_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='Data storage for node field og_audience.'

Show create for the table node_field_data

| node_field_data | CREATE TABLE `node_field_data` (
  `nid` int unsigned NOT NULL,
  `vid` int unsigned NOT NULL,
  `type` varchar(32) CHARACTER SET ascii NOT NULL COMMENT 'The ID of the target entity.',
  `langcode` varchar(12) CHARACTER SET ascii NOT NULL,
  `status` tinyint NOT NULL,
  `uid` int unsigned NOT NULL COMMENT 'The ID of the target entity.',
  `title` varchar(255) COLLATE utf8mb4_general_ci NOT NULL,
  `created` int NOT NULL,
  `changed` int NOT NULL,
  `promote` tinyint NOT NULL,
  `sticky` tinyint NOT NULL,
  `default_langcode` tinyint NOT NULL,
  `revision_translation_affected` tinyint DEFAULT NULL,
  PRIMARY KEY (`nid`,`langcode`),
  KEY `node__id__default_langcode__langcode` (`nid`,`default_langcode`,`langcode`),
  KEY `node__vid` (`vid`),
  KEY `node_field__type__target_id` (`type`),
  KEY `node_field__uid__target_id` (`uid`),
  KEY `node_field__created` (`created`),
  KEY `node_field__changed` (`changed`),
  KEY `node__status_type` (`status`,`type`,`nid`),
  KEY `node__frontpage` (`promote`,`status`,`sticky`,`created`),
  KEY `node__title_type` (`title`(191),`type`(4))
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci COMMENT='The data table for node entities.'

Please find some of the relevant database variable settings.

[mysqld]
default-storage-engine=InnoDB
join_buffer_size = 8M
read_buffer_size = 4M
sort_buffer_size = 8M
thread_cache_size = 8
interactive_timeout = 60
wait_timeout = 60 # Time in seconds
connect_timeout = 10
max_connect_errors = 10000
tmp_table_size = 32M
max_heap_table_size = 32M

# InnoDB Settings
innodb_buffer_pool_size=18G
innodb_flush_log_at_trx_commit=2
#Set the log file size to about 25% of the buffer pool size
innodb_log_file_size=6G
innodb_log_buffer_size=64M
innodb_flush_method=O_DIRECT
innodb_buffer_pool_instances=8
innodb_stats_on_metadata=0
innodb_lock_wait_timeout=100
innodb_write_io_threads=8

Please share what changes can be made to make this more performant.Indexes have been added to flagging table. Please share monitoring tools that can help us understand the problems better, database global variable changes that can make this query execution faster. Thanks.

Note: As suggested by Rick James, changing the data type for column entity_id in flagging table from varchar to unsigned int resolved the query performance during joins. Thanks.

Best Answer

These indexes may help performance:

node_field_data:  INDEX(status, type, nid,  langcode)
flagging_node_field_data:  INDEX(entity_id,  flag_id)
node__field_blood_donation_camp:  INDEX(entity_id,  deleted, langcode, bundle)
node__field_date_of_collection:  INDEX(entity_id, deleted, field_date_of_collection_value)
node__og_audience:  INDEX(entity_id,  deleted, langcode, og_audience_target_id)

When adding a composite index, DROP index(es) with the same leading columns. That is, when you have both INDEX(a) and INDEX(a,b), toss the former.

There is an OR that may be slowing down the query significantly. Can the query be rephrased? Or we can discuss how to switch to UNION.

Is there any reason not to simplify

DATE_FORMAT(dc.field_date_of_collection_value,
            '%Y-%m-%d\T%H:%i:%s') BETWEEN 
DATE_FORMAT('2022-08-25T00:00:00', '%Y-%m-%d\T%H:%i:%s') AND
DATE_FORMAT('2022-08-31T00:00:00', '%Y-%m-%d\T%H:%i:%s')

to

dc.field_date_of_collection_value  BETWEEN
        '2022-08-25` AND '2022-08-31'

That will make my suggested index more effective. (see sargable ) (Also note that, either way, there is a extra one second, namely midnight of the 31st)

Most of what went into designing these indexes comes from Index Cookbook

Please don't use such long table names; they make the query hard to read.

Inconsistency

You have nid and entity_id declared as INT in most places, but there is a difference in one other place:

entity_id varchar(255) COLLATE utf8mb4_general_ci DEFAULT NULL,

If it is really just an INT, not a VARCHAR, then fix the inconsistency.

When doing JOIN ON dissimilarly defined columns, the INDEXes usually cannot be used.