MySQL – Query Multiplying Records Returned (Cartesian Product)

MySQL

I am using the following query which is returning a huge number of multiplied records – i.e. a cartesian product:

SELECT * FROM specimen, topography_index, morphology, functions 
WHERE 
SUBSTRING(specimen.topography_index, 2, 2) = topography_index.topography_index_code 
AND 
if(specimen.snop_axis = 'M', morphology.morphology_code = specimen.snop_code, functions.functions_code = specimen.snop_code) 
AND 
specimen.topography_index = '_ORGAN_'
ORDER BY 
(specimen.room = 'f') DESC,specimen.snop_code

The number of records being returned is around 59,000. However, specimen table only has 469 records. The morphology and functions tables have around 2000 records each, and the topography_index table has 29 records, so I'm guessing that the results are being multiplied by the number of records in 'topography_index' and 'morphology' or 'functions' tables (29 x 2000), whichever one that doesn't have a join in the if clause.

How should the query be written so that this doesn't happen?

Note that GROUP BY is NOT an option as the query is so slow with all the records being returned.

http://sqlfiddle.com/#!2/2bda8/1

Perhaps this could be handled by a sub select statement?

Any help appreciated!

Best Answer

Your cartesian product should be the product of # of rows in properly joined tables times the rows in tables that are not properly joined (i.e. don't have effective condition in the WHERE clause). For cases where specimen.snop_axis = 'M' morphology and specimen are propely joined but functions isn't, and when specimen.snop_axis <> 'M' functions and specimen are properly joined but morphology isn't. That's most likely why the resulting number of rows doesn't seem to be any direct product of rows in the tables.

To solve this I'd first try to join all the tables properly by replacing

AND if(specimen.snop_axis = 'M', morphology.morphology_code = specimen.snop_code, functions.functions_code = specimen.snop_code) 

with

AND functions.functions_code = specimen.snop_code
AND morphology.morphology_code = specimen.snop_code

To me, this seems to give ok result with the sample data, but I wouldn't be too sure of how it behaves with your real data.

I'm not completely sure what you are trying to obtain by having that IF condition in WHERE clause. If the objective is to return morphology data and no functions data when specimen.snop_axis='M' and in other cases to return functions data and no morphology data, I'd suggest having the selection logic applied in the columns that are returned, somehow like this:

SELECT
IF(specimen.snop_axis = 'M', morphology_pk, functions_pk) matching_pk,
IF(specimen.snop_axis = 'M', morphology_code, functions_code) matching_code,
IF(specimen.snop_axis = 'M', morphology_nomen, functions_nomen) matching_nomen,
specimen.*,
topography_index.*

instead of just having columns from both tables returned always. But as I said, I'm not quite sure I understand what is required.