First off: Sorry for the title, couldn't think of anything better. Feel free to change it though.
Ok, assume I have the following table:
-----------------------------------------------------------------------------------
| id | category | value | foreign_id_1 | foreign_id_2 | foreign_id_3 | foreign_id_4 |
-----------------------------------------------------------------------------------
| 1 | A | 4 | 1 | 1 | 1 | 1 |
| 2 | B | 9 | 1 | 1 | 1 | 1 |
| 3 | A | 5 | 2 | 2 | 2 | 2 |
| 4 | A | 6 | 3 | 3 | 3 | 3 |
| 5 | B | 8 | 3 | 3 | 3 | 3 |
| | | | | ... | | |
-----------------------------------------------------------------------------------
Now I want to create a scatter plot for the categories (A
as x
, B
as y
based on the value
column).
The desired output would be:
----------------------
| foreign_id_1 | x | y |
----------------------
| 1 | 4 | 9 |
| 3 | 6 | 8 |
----------------------
Row 3 from the original table isn't contained, because there only is a value for A
and not for B
.
What I've tried so far is the following:
SELECT A.foreign_id_1, A.value as x, B.value AS y
FROM
(
SELECT value, foreign_id_1, foreign_id_2, foreign_id_3, foreign_id_4
FROM table
WHERE category = 'A'
) AS A
JOIN
(
SELECT value, foreign_id_1, foreign_id_2, foreign_id_3, foreign_id_4
FROM table
WHERE category = 'B'
) AS B
ON
(
A.foreign_id_1 = B.foreign_id_1
AND A.foreign_id_2 = B.foreign_id_2
AND A.foreign_id_3 = B.foreign_id_3
AND A.foreign_id_4 = B.foreign_id_4
)
That all works fine, but it's really slow (30s with my current data set and that might expand in the future). The problem seems to be that the only way to identify the "matching rows" (the ones that form x
and y
of a data point) is expensive, because all foreign keys have to match.
Now to the question:
Assuming that I cannot change the table structure, is there a way to make this query faster? Or maybe there is a different way to get the desired result that I couldn't think of?
Best Answer
There is no reason to use derived tables for this query. It can affect performance badly with MySQL's primitive optimizer. Try this query:
If the efficiency of the query is crucial, you can add a covering index, with the columns in tailor made order, especially for the query: