Mysql – Optimize finding events with overbooked attendees

MySQL

I have the following strcture to store events and their attendees:

CREATE TABLE `plan_event` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `start_datetime` datetime DEFAULT NULL,
  `end_datetime` datetime DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `start_datetime_idx` (`start_datetime`),
  KEY `end_datetime_idx` (`end_datetime`),
  KEY `start_end_datetime_idx` (`start_datetime`,`end_datetime`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

CREATE TABLE `plan_event_attendee` (
  `event_id` int(11) NOT NULL,
  `user_id` int(11) NOT NULL,
  PRIMARY KEY (`event_id`,`user_id`),
  KEY `IDX_E8625CBD71F7E88B` (`event_id`),
  KEY `IDX_E8625CBDA76ED395` (`user_id`),
  CONSTRAINT `FK_E8625CBD71F7E88B` FOREIGN KEY (`event_id`) REFERENCES `plan_event` (`id`) ON DELETE CASCADE,
  CONSTRAINT `FK_E8625CBDA76ED395` FOREIGN KEY (`user_id`) REFERENCES `plan_user` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

I would like to find all events where someone is attending in another event at the same time (no need to know who exactly, just which events has at least one of such attendee). This is what I have so far:

    SELECT e1.id, e2.id
      FROM plan_event e1
INNER JOIN plan_event e2
        ON e2.start_datetime < e1.end_datetime
       AND e2.end_datetime > e1.start_datetime
       AND e2.id <> e1.id
INNER JOIN plan_event_attendee a1
        ON a1.event_id = e1.id
INNER JOIN plan_event_attendee a2
        ON a2.event_id = e2.id
     WHERE a1.user_id = a2.user_id
       AND e1.start_datetime > '2015-06-01'
       AND e1.end_datetime < '2016-06-01'

Unfortunately with the given dataset the query is not performant enough, likely caused by the exponential nature of joining tables to themselves. EXPLAIN gives me the following output:

+----+-------------+-------+--------+--------------------------------------------------------------------+------------------------+---------+------------------+-------+--------------------------+
| id | select_type | table | type   | possible_keys                                                      | key                    | key_len | ref              | rows  | Extra                    |
+----+-------------+-------+--------+--------------------------------------------------------------------+------------------------+---------+------------------+-------+--------------------------+
|  1 | SIMPLE      | e1    | range  | PRIMARY,start_datetime_idx,end_datetime_idx,start_end_datetime_idx | start_end_datetime_idx | 9       | NULL             | 26640 | Using where; Using index |
|  1 | SIMPLE      | a1    | ref    | PRIMARY,IDX_E8625CBD71F7E88B,IDX_E8625CBDA76ED395                  | PRIMARY                | 4       | aire.e1.id       |     4 | Using index              |
|  1 | SIMPLE      | a2    | ref    | PRIMARY,IDX_E8625CBD71F7E88B,IDX_E8625CBDA76ED395                  | IDX_E8625CBDA76ED395   | 4       | aire.a1.user_id  |   475 | Using where; Using index |
|  1 | SIMPLE      | e2    | eq_ref | PRIMARY,start_datetime_idx,end_datetime_idx,start_end_datetime_idx | PRIMARY                | 4       | aire.a2.event_id |     1 | Using where              |
+----+-------------+-------+--------+--------------------------------------------------------------------+------------------------+---------+------------------+-------+--------------------------+

What other performance optimizations could I attempt or this the best I can get?

Best Answer

The query itself looks pretty much the way to do it. As for indexing strategies I suggest that you drop indexes:

KEY `start_datetime_idx` (`start_datetime`) -- covered by start_end_datetime_idx

KEY `IDX_E8625CBD71F7E88B` (`event_id`) -- covered by PRIMARY

and replace:

KEY `IDX_E8625CBDA76ED395` (`user_id`),

with:

KEY `IDX_E8625CBDA76ED395` (`user_id`, event_id)