I have a table (let's call it extra_info
) in MySQL that has various extra information/labels on user accounts:
account_id / data_key
11 / 'test'
11 / 'blah'
12 / 'coupon a'
12 / 'coupon b'
15 / 'whatever'
18 / 'test'
And so on.
I am building reports on other tables but need to filter out records that are present in the extra_info table as 'test'.
So in the example above, records 1-10, 13-14, 16-17 are fine because they are not in extra_info, 11 needs to be filtered out (it is 'test'), 12 is fine (but obviously just need to have it once even though is has 2 labels, 15 is fine (not 'test'), and 18 needs to be removed.
I have done this with subquery in the WHERE
clause:
WHERE
-- filter out test labels
a.id NOT IN (SELECT ei.account_id
FROM extra_info AS ei
WHERE ei.data_key = 'test')
This does its job, however, I have a suspicion that a regular join would have worked faster. What's the right syntax for it?
If I just join and filter out the ones with data_key='test', #11 would still be there – so this is not a solution. Grouping by account_id will probably has performance implications and basically same impact as the current code, right?
Any ideas?
Best Answer
One option would look something like this:
This might seem counter-intuitive, since "obviously" er.account_id is never actually
NULL
sitting at rest in the table, so I'll explain:The
LEFT JOIN
of course means all rows fromaccount
and only the matching rows fromextra_data
... and by "matching rows" we mean those having an identical account_id and a value of 'test' in data_key.Every row in a result set has, at some point during execution, either a value (or a
NULL
) for every column from all tables, even if those columns aren't listed in theSELECT
list... so, in aLEFT JOIN
where the table on the right has no corresponding rows, the columns in the result set that "came from" the right-hand table areNULL
... so, WHERE ei.account_id IS NULL filters out the rows with a value for ei.account_id (the ones with matching account_id and 'test' in data_key) leaving you with a result set containing the rows that you want.If it's not apparent now, it makes more sense when you remember that the
WHERE
clause is not telling the server which rows to find in this case... it's specifying which rows not to eliminate.Now, if you need other data from
extra_info
in your report, you will need to [left] join that same table again in your query with different aliases in addition to ei in order to pluck out the other values.@ypercube's advice on indexes still stands... in the case of the query in the question, an index on (data_key,account_id) would probably be more useful, since the subquery isn't correlated to anything in the outer query, while in the
LEFT JOIN
query, this index would be what you'd want: