Good evening,
I'm currently up to try to improve the performance of some of my queries.
As far as I know, statements like "IN" or "NOT IN" are even faster with a large amount of values if a subquery on an indexed field without conditions is used.
SELECT * FROM table1 WHERE field1 NOT IN (SELECT index_field FROM table2)
When it comes to use conditions at table2
, the query becomes realy slow on a large amount of data.
SELECT *
FROM table1
WHERE
field1 NOT IN (
SELECT
index_field
FROM table2
WHERE
user_id = '2'
)
I was thinking about using LEFT Join
but when I need to to the filtering for the user_id
, it also becomes slow.
How may I solve this problem? Currently, I've no clue about that.
The original query looked sth. like this
SELECT
i.*
FROM stream_item si
LEFT JOIN user__publisher_item ui
ON ui.user_id = 2
&& ui.item_id = si.item_id
INNER JOIN item i
ON i.id = si.item_id
WHERE
si.stream_id IN (5,7,8,9,19,24,29,42,43,44)
&& ui.id IS NULL || (
ui.`read` = 0 || ui.saved = 1
)
GROUP BY
si.item_id
ORDER BY
si.`found` DESC
LIMIT 0,10
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE si index PRIMARY,stream_2_item,stream_id_found stream_2_item 4 \N 663236 Using temporary; Using filesort
1 SIMPLE ui eq_ref user_id_item_id,user_2_item,user_id_read_saved_hidden,user_id_saved,user_id_hidden user_id_item_id 8 const,si.item_id 1 Using where
1 SIMPLE i eq_ref PRIMARY PRIMARY 4 si.item_id 1
I have a table which represents the n:m relationship between stream and items.
An unique item can be accessible via multiple streams.
Based upon this i have a state-table for user dependent states of an item.
Currently, the user-state-entry will only be generated, if the item was read or saved the first tim. Otherwise, there is no entry in the user_item
table.
In the beginning, this query was fine but now, it takes a lot of time. Even if I leave out the "INNER JOIN
" it takes 7 seconds on the current tables with around 500k rows each.
The table-structure in the background is as follows:
- stream
- stream_item //n:m relation between stream and items
- item //unique items
- stream_group //user-defined groups of multiple streams
- stream_group_streams //_n:m relation between stream_groups and streams_
- user
- user_item //state-table that holds the user-dependent states of an item – if there is no state, no row exists for an item
I already asked this question at Stackoverflow but added my 2 Edits with further information to late so that I haven't got an answer anymore.
Best Answer
In general you can replace a
NOT IN (...)
subquery with aNOT EXISTS (...)
and the latter will be faster. Logically they are the same though some database optimizers handle theNOT EXISTS
better as they can stop the subquery check after the first match.In contrast, if the DB optimizer isn't smart enough to figure out what your doing and the subquery has a lot of rows, then it must buffer all the rows for each subquery before performing each
NOT IN
check.Try out something like this:
In this example the optimizer need only check if a single row exists for the matching subquery, including the additional filter on
b.user_id = 2
. It should use an index on theindex_field
anduser_id
if they are available.