MySQL – How to Select Records Not Associated with a Specific Tag in Many-to-Many Relation

MySQLperformance

as shown in this sql-fiddle I have a bridge table events_tags representing a many-to-many between events and tags.

I need the fastest way to find all events not associated to a given tag t.

The first query in the fiddle shows the incorrect solution where a an event assigned to t and another tag is included in the result set.

The second query makes a anti-join (learned this from an answer by ypercubeᵀᴹ) and is correct, but very slow in my real schema, where it would return 18K rows from the bridge table in ~130ms on the hardware on which our application runs.

I need this functionality in an endpoint of a RESTful API, whose total response time must stay <300ms.

Is there a faster approach I can use or am I limited to the fact that the "real" query has to deal with a derived table of 18K rows?

EDIT
I failed to mention that I will then combine this (as a subquery) with another query selecting only events matching some criteria.
I am not selecting 18K events 🙂

P.S.: let me know if any other detail on the problem is required

Thanks

Best Answer

SELECT DISTINCT events.id, events.title
  FROM events 
  JOIN events_tags et1 
         on events.id = et1.event_id 
        and not exists ( select 1 
                           from events_tags et2 
                          where et2.event_id = et1.event_id
                            and et2.tag_id IN (1) );

And why a iden PK in events_tags? Drop that.
Just have a composite PK of tag_ID, event_id in that order.

This might be faster but I doubt it

SELECT DISTINCT events.id, events.title
  FROM events 
  JOIN events_tags et1 
         on et1.event_id = events.id  
        and et1.tag_id not IN (1)
        and not exists ( select 1 
                           from events_tags et2 
                          where et2.event_id = et1.event_id
                            and et2.tag_id IN (1)
                       );

I don't think MYSQL supports except

SELECT events.id, events.title
  FROM events 
  JOIN (   select event_id  
           from events_tags
           where tag_id not IN (1)
         except
           select event_id  
           from events_tags
           where tag_id     IN (1)
       ) tt
    on tt.event_id = events.id

Related Solutions

MySQL Unions/Subselects not utilizing keys from associated tables

The optimizer is not that clever - yet (see footer).

You could still use UNION, if you rewrite the query (a job that could/should be done by a decent optimizer):

    SELECT  
    `part_number`, 
    `part_manufacturer_clean`, 
    `part_number_clean`, 
    `part_heci`, 
    `part_manufacturer`, 
    `part_description`
    FROM `new_products` AS `a`
    WHERE `part_manufacturer_clean` = 'adc'
UNION 
    SELECT 
    `part` as `part_number`,
    `manulower` as `part_manufacturer_clean`,
    `partdeluxe` as `part_number_clean`,
    `heci` as `part_heci`,
    `manu` as `part_manufacturer`,
    `description` as `part_description`
    FROM `warehouse` AS `b`
    WHERE `manulower` = 'adc' ;

You can also try latest MariaDB versions (5.3 and 5.5) that have several improvements on the optimizer (if changing to MariaDB is an option).

You can also try MySQL 5.6 (still in beta) which has some improvements on the optimizer, too.

Mysql – Best solution for search scenario

It depends on the number of articles you expect to have. If you have indices on the columns you are filtering, it will be pretty fast. Joining is very fast and will surely not be the problem.

But your WHERE clause is very intensive. Because you are basically scanning all rows in the table and therefore loading it all from the disk to compare. This cannot be avoided in your case, but it will be the crucial part. Nevertheless, if you aren't going past 100'000 articles I wouldn't be too concerned. Still - test it, if you want to be sure!

However, your use case seems to be indexing document (articles) and get the top x of them, based on a search query. A way to be faster and more efficient is, to create vectors from documents which contain for each word its occurrence. Your search query will be transformed into one too, and then it's all about comparing vectors. (simplified)

You don't have to implement that of course, there are libraries for document indexing and searching. Maybe you want to have a look at lucene (or maybe you can even use solr which should do everything for you). Just to give you some ideas, with those tools you can build your own little 'google'. Lucene would of course build its own (noSql) database / index. So you would maybe need to separate this functionality from the rest of your application (I don't know the context).

Best Answer

Related Solutions

MySQL Unions/Subselects not utilizing keys from associated tables

Mysql – Best solution for search scenario

Related Question