MySQL Performance – Issues Extracting Scatter Plot Data

join;MySQLperformancequery-performance

First off: Sorry for the title, couldn't think of anything better. Feel free to change it though.

Ok, assume I have the following table:

 -----------------------------------------------------------------------------------
| id | category | value | foreign_id_1 | foreign_id_2 | foreign_id_3 | foreign_id_4 |
 -----------------------------------------------------------------------------------
|  1 | A        |     4 | 1            | 1            | 1            | 1            |
|  2 | B        |     9 | 1            | 1            | 1            | 1            |
|  3 | A        |     5 | 2            | 2            | 2            | 2            |
|  4 | A        |     6 | 3            | 3            | 3            | 3            |
|  5 | B        |     8 | 3            | 3            | 3            | 3            |
|    |          |       |              |      ...     |              |              |
 -----------------------------------------------------------------------------------

Now I want to create a scatter plot for the categories (A as x, B as y based on the value column).
The desired output would be:

 ----------------------
| foreign_id_1 | x | y |
 ----------------------
| 1            | 4 | 9 |
| 3            | 6 | 8 |
 ----------------------

Row 3 from the original table isn't contained, because there only is a value for A and not for B.

What I've tried so far is the following:

SELECT A.foreign_id_1, A.value as x, B.value AS y
FROM
  (
    SELECT value, foreign_id_1, foreign_id_2, foreign_id_3, foreign_id_4
    FROM table
    WHERE category = 'A'
  ) AS A
JOIN
  (
    SELECT value, foreign_id_1, foreign_id_2, foreign_id_3, foreign_id_4
    FROM table
    WHERE category = 'B'
  ) AS B
ON
  (
    A.foreign_id_1 = B.foreign_id_1
    AND A.foreign_id_2 = B.foreign_id_2
    AND A.foreign_id_3 = B.foreign_id_3
    AND A.foreign_id_4 = B.foreign_id_4
  )

That all works fine, but it's really slow (30s with my current data set and that might expand in the future). The problem seems to be that the only way to identify the "matching rows" (the ones that form x and y of a data point) is expensive, because all foreign keys have to match.

Now to the question:

Assuming that I cannot change the table structure, is there a way to make this query faster? Or maybe there is a different way to get the desired result that I couldn't think of?

Best Answer

There is no reason to use derived tables for this query. It can affect performance badly with MySQL's primitive optimizer. Try this query:

SELECT 
    a.foreign_id_1, a.value AS x, b.value AS y
FROM
    table AS a
  JOIN
    table AS b
      ON  a.foreign_id_1 = b.foreign_id_1
      AND a.foreign_id_2 = b.foreign_id_2
      AND a.foreign_id_3 = b.foreign_id_3
      AND a.foreign_id_4 = b.foreign_id_4
WHERE
      a.category = 'A'
  AND b.category = 'B' ;

If the efficiency of the query is crucial, you can add a covering index, with the columns in tailor made order, especially for the query:

ALTER TABLE tablename
  ADD INDEX category_foreign_ids_value          -- pick a name for the index
                                                              -- columns from:
    (category,                                                -- the WHERE 
     foreign_id_1, foreign_id_2, foreign_id_3, foreign_id_4,  -- the JOIN ... ON
     value) ;                                                 -- the SELECT