Mysql – How to optimise multiple joins with OR

join;MySQLoptimizationperformance

I need to extract some statistical information form drupal database tables. The problem with drupal is that it splits it's data for entities across multiple tables and getting simple information from such database usually requires multiple joins.

In this case I have structure similar to this (I'm simplifying things for the purpose of this question as the whole structure is way more complicated):

node table

  • nid
  • title
  • uid (user id from users table)
  • type (can be: flight or airport)
  • [others]

departure_field table

  • id
  • flight_node_id (node id from nodes table with type 'flight')
  • departure_node_id (node id from nodes table with type 'airport')

arrival_field table

  • id
  • flight_node_id (node id from nodes table with type 'flight')
  • arrival_node_id (node id from nodes table with type 'airport')

The query

Now let's say I want to know top airports visited by the given user. The query could go like this:

SELECT
  COUNT(a.nid) AS cnt, a.nid, a.title
FROM
  node a, /* Airport */
  departure_field dep,
  arrival_field arr,
  node f /* Flight */
WHERE
  /* All flights of user with id 100 */
  f.type = 'flight' AND f.uid = 100
  AND /* Include data for departure and arrival airports */
  f.nid = dep.flight_node_id AND f.nid = arr.flight_node_id
  /* Combine that with airport data */
  AND (a.nid = dep.departure_node_id OR a.nid = arr.arrival_node_id)
GROUP BY
  a.nid
ORDER BY
  a.cnt DESC

The problem

This query, although looks quite simple, uses temp table and file sort. And of course runs for ages. How can I optimise it for MySQL to run fast enough to be usable during webpage rendering? For now the query executes few minutes with database having data for 50k+ flights.

Note that this is not the actual query and table structure. The field naming in the drupal database is insane, so I tried do simplify things for the purpose of this question.

Best Answer

As jkavalik suggested in his comment, I tried UNION and it worked great. The performance improvement was astronomical.

SELECT
  COUNT(a.nid) AS cnt, a.nid, a.title
FROM
  node a /* Airport */
JOIN
  (
    (SELECT
      dep.nid
    FROM
      departure_field dep
    JOIN
      node f
    ON
      f.nid = dep.flight_node_id
    WHERE
      f.type = 'flight' AND f.uid = 100
    )
  UNION ALL
    (SELECT
      arr.nid
    FROM
      arrival_field are
    JOIN
      node f
    ON
      f.nid = arr.arrival_node_id
    WHERE
      f.type = 'flight' AND f.uid = 100
    )
  ) airport
ON
  airport.id = a.nid
GROUP BY
  a.nid
ORDER BY
  a.cnt DESC

Note that in my case UNION ALL was the key, because it includes duplicates from both arrivals and departures. And this was important because I wanted to count how much times each airport was visited. Without ALL each airport would show up only once as simple UNION removes duplicates.