Mysql – Subquery in FROM slower than two separate queries

mariadbMySQLperformancequery-performance

I'm working on a system where users can buy "punch cards" to join events. A client buys a card from an organiser. The card is valid for specific event types and for a specified number of events.

The following tables are relevant:

  • event_type: the type of event, e.g. swimming or painting.
  • event_type_to_organiser: this table lists which event types belong to which organisers.
  • event_sequence: a collection of connected events. For example, if there is a swimming event every tuesday, those events would be one event_sequence. State 1 means an active sequence.
  • event: the event itself. This table has a state-column: state 1 means the event is enabled (there are other states for deleted, cancelled and pending events). The start_datetime column is used to determine if an event has alreday taken place or if it took place in the past.
  • event_card: this is a generic punch card. This table has a state-column: here state 1 is enabled.
  • event_card_to_event_type: this table defined which cards are valid for which event types.
  • event_card_to_client: this table contains cards that have been purchased by clients. State 1 is an active card, other states are deleted, expired and full. This table also contains the number of events for which the card is valid.
  • event_sequence_to_client: this table contains subscriptions from clients to event_sequences.
  • event_to_client: this table contains all events clients attended, as well as individual future events that clients have signed up for. So clients can either subscribe to a sequence, or to a single event. If an event took place in the past, the event_card_to_client_id lists which card was used for the event.
  • client_to_organiser: this table lists all connections between clients and organisers.

Each organiser has several event_types. Each event_type has several event_cards.
Clients can subscribe to event or event_sequences. If a client is subscribed to a future event, or if a client is subscribed to an event sequence that has events in the future, I need to know how many events the client has left on all of his/her cards for the associated event_type.

I've come up with a pretty monstrous query to accomplish this:

SELECT
  `sq2`.`event_type_id`,
  (CASE WHEN `sq2`.`total` IS NOT NULL
    THEN `sq2`.`total` - `sq2`.`used`
   ELSE NULL END) AS `events_left`
FROM
  (
    SELECT
      `sq1`.`id` AS `event_type_id`,
      CASE WHEN COUNT(`ectc`.`id`) = 0
        THEN 0
      ELSE (
        CASE WHEN COUNT(`ectc`.`id`) = COUNT(`ectc`.`event_count`)
          THEN SUM(DISTINCT `ectc`.`event_count`)
        ELSE NULL END
      ) END AS `total`,
      COUNT(`etc`.`id`) AS `used`
    FROM
      (
        SELECT DISTINCT `et`.`id`
        FROM `event_type` `et`
          JOIN `event_type_to_organiser` `etto`
            ON `etto`.`event_type_id` = `et`.`id`
          JOIN `client_to_organiser` `cto`
            ON `cto`.`organiser_id` = `etto`.`organiser_id`
               AND `cto`.`id` = 1
          #todo ID
          JOIN `event_sequence` `es`
            ON `et`.`id` = `es`.`event_type_id`
               AND `es`.`state` = 1
          JOIN `event` `e`
            ON `e`.`event_sequence_id` = `es`.`id`
               AND `e`.`start_datetime` > NOW()
               AND `e`.`state` = 1
          JOIN `event_sequence_to_client` `estc`
            ON `estc`.`event_sequence_id` = `es`.`id`
               AND `estc`.`client_id` = `cto`.`client_id`
          JOIN `event_card_to_event_type` `ectet`
            ON `ectet`.`event_type_id` = `et`.`id`
          JOIN `event_card`
            ON `event_card`.`id` = `ectet`.`event_card_id`
               AND `event_card`.`state` = 1
        UNION DISTINCT
        SELECT DISTINCT `et`.`id`
        FROM `event_type` `et`
          JOIN `event_type_to_organiser` `etto`
            ON `etto`.`event_type_id` = `et`.`id`
          JOIN `client_to_organiser` `cto`
            ON `cto`.`organiser_id` = `etto`.`organiser_id`
               AND `cto`.`id` = 1
          #todo ID
          JOIN `event_sequence` `es`
            ON `et`.`id` = `es`.`event_type_id`
               AND `es`.`state` = 1
          JOIN `event` `e`
            ON `e`.`event_sequence_id` = `es`.`id`
               AND `e`.`start_datetime` > NOW()
               AND `e`.`state` = 1
          JOIN `event_to_client` `etc`
            ON `etc`.`event_id` = `e`.`id`
               AND `etc`.`client_id` = `cto`.`client_id`
          JOIN `event_card_to_event_type` `ectet`
            ON `ectet`.`event_type_id` = `et`.`id`
          JOIN `event_card`
            ON `event_card`.`id` = `ectet`.`event_card_id`
               AND `event_card`.`state` = 1
      ) `sq1`
      JOIN `client_to_organiser` `cto`
        ON `cto`.`id` = 1
      #todo ID
      JOIN `event_card_to_event_type` `ectet`
        ON `ectet`.`event_type_id` = `sq1`.`id`
      JOIN `event_card`
        ON `event_card`.`id` = `ectet`.`event_card_id`
           AND `event_card`.`state` = 1
      LEFT JOIN `event_card_to_client` `ectc`
        ON `ectc`.`event_card_id` = `event_card`.`id`
           AND `ectc`.`client_id` = `cto`.`client_id`
           AND `ectc`.`state` = 1
      LEFT JOIN `event_to_client` `etc`
        ON `etc`.`event_card_to_client_id` = `ectc`.`id`
    GROUP BY `sq1`.`id`
  ) `sq2`

The innermost subquery determines which event_types are relevant for a specific client_to_organiser entry: that is, all event_types organised by the organiser, for which the client is subscribed to future events or sequences that contain future events.
The middle subquery determines the number of events the client has in total on all his/her cards for each event type. Since not every card is valid for each event_type, this needs to be determined for each event_type. Furthermore, the number of events already used is calculated here.
Finally, the outermost query calculates how many events the user has left for each event_type (total – used).


The query works fine and gives the correct data output. However, there appears to be a slight performance issue. The query above, when executed on our current dataset takes about 0.04 seconds to complete. That's not a huge deal-breaker at the moment, but in the future we'd like to speed it up a bit more. Curiously, if I split the query in two parts it performs about twice as fast:

        SELECT DISTINCT `et`.`id`
        FROM `event_type` `et`
          JOIN `event_type_to_organiser` `etto`
            ON `etto`.`event_type_id` = `et`.`id`
          JOIN `client_to_organiser` `cto`
            ON `cto`.`organiser_id` = `etto`.`organiser_id`
               AND `cto`.`id` = 1
          #todo ID
          JOIN `event_sequence` `es`
            ON `et`.`id` = `es`.`event_type_id`
               AND `es`.`state` = 1
          JOIN `event` `e`
            ON `e`.`event_sequence_id` = `es`.`id`
               AND `e`.`start_datetime` > NOW()
               AND `e`.`state` = 1
          JOIN `event_sequence_to_client` `estc`
            ON `estc`.`event_sequence_id` = `es`.`id`
               AND `estc`.`client_id` = `cto`.`client_id`
          JOIN `event_card_to_event_type` `ectet`
            ON `ectet`.`event_type_id` = `et`.`id`
          JOIN `event_card`
            ON `event_card`.`id` = `ectet`.`event_card_id`
               AND `event_card`.`state` = 1
        UNION DISTINCT
        SELECT DISTINCT `et`.`id`
        FROM `event_type` `et`
          JOIN `event_type_to_organiser` `etto`
            ON `etto`.`event_type_id` = `et`.`id`
          JOIN `client_to_organiser` `cto`
            ON `cto`.`organiser_id` = `etto`.`organiser_id`
               AND `cto`.`id` = 1
          #todo ID
          JOIN `event_sequence` `es`
            ON `et`.`id` = `es`.`event_type_id`
               AND `es`.`state` = 1
          JOIN `event` `e`
            ON `e`.`event_sequence_id` = `es`.`id`
               AND `e`.`start_datetime` > NOW()
               AND `e`.`state` = 1
          JOIN `event_to_client` `etc`
            ON `etc`.`event_id` = `e`.`id`
               AND `etc`.`client_id` = `cto`.`client_id`
          JOIN `event_card_to_event_type` `ectet`
            ON `ectet`.`event_type_id` = `et`.`id`
          JOIN `event_card`
            ON `event_card`.`id` = `ectet`.`event_card_id`
               AND `event_card`.`state` = 1

And:

SELECT
  `sq2`.`event_type_id`,
  (CASE WHEN `sq2`.`total` IS NOT NULL
    THEN `sq2`.`total` - `sq2`.`used`
   ELSE NULL END) AS `events_left`
FROM
  (
    SELECT
      `sq1`.`id` AS `event_type_id`,
      CASE WHEN COUNT(`ectc`.`id`) = 0
        THEN 0
      ELSE (
        CASE WHEN COUNT(`ectc`.`id`) = COUNT(`ectc`.`event_count`)
          THEN SUM(DISTINCT `ectc`.`event_count`)
        ELSE NULL END
      ) END AS `total`,
      COUNT(`etc`.`id`) AS `used`
    FROM
      (
        SELECT `id` FROM `event_type` WHERE `id` IN(1,2)
      ) `sq1`
      JOIN `client_to_organiser` `cto`
        ON `cto`.`id` = 1
      #todo ID
      JOIN `event_card_to_event_type` `ectet`
        ON `ectet`.`event_type_id` = `sq1`.`id`
      JOIN `event_card`
        ON `event_card`.`id` = `ectet`.`event_card_id`
           AND `event_card`.`state` = 1
      LEFT JOIN `event_card_to_client` `ectc`
        ON `ectc`.`event_card_id` = `event_card`.`id`
           AND `ectc`.`client_id` = `cto`.`client_id`
           AND `ectc`.`state` = 1
      LEFT JOIN `event_to_client` `etc`
        ON `etc`.`event_card_to_client_id` = `ectc`.`id`
    GROUP BY `sq1`.`id`
  ) `sq2`

When these two parts are executed separately, each part takes about 0.002 seconds. 0.004 seconds in total. That's about 5 times as fast as when I combine it into a single query.

Does anyone know what could be the reason the query performs much slower when executed as a single query, compared to executing it in two parts?


For completeness, here is the (stripped) database structure with some sample data: http://pastebin.com/ZkV3NcU2

I'm running MariaDB version 5.5.41.

Best Answer

its really hard to be able to truly optimize this without knowing the sizes of the tables, but I would recommend reorganizing queries like this with common table expressions (CTEs). A CTE works like a sub query, but allows you to throw results in memory and refer back to it multiple times. Since you go back to the same tables multiple times within your sub queries, i would recommend playing around with that. This wouldn't be the full code, just an idea of a starting place

WITH event_sequence_cte AS ( SELECT es.event_type_id FROM event_sequence es WHERE es.state = 1 ), event_cte AS ( SELECT e.event_sequence_id,e.id FROM event e WHERE e.start_datetime > NOW() AND e.state = 1 )