Mysql – WHERE on SQL with JOIN on large tables

join;MySQLwhere

We have a set of large tables (millions of records) on a MySQL DB, having schema like so (simplified):

T1: id, uid, rid, text1, text2, int1, int2, ...
T2: id, rid, tag_id, created_at
T3: id, owner_id, tag_name

Indices:

T1: Primary(id), unique(uid,rid), index(rid), index(uid,int1)
T2: Primary(id), unique(tag_id,rid), index(tag_id), index(created_at)
T3: Primary(id), unique(owner_id,tag_name)

And a requirement to do a select which returns 'rids' having tag_name = XX but not YY:

SELECT t1.rid 
FROM t1 
LEFT JOIN t2 ON t1.rid = t2.rid 
LEFT JOIN t3 ON t2.tag_id = t3.id 
WHERE t1.uid = 123
AND t1.int1 = 3
AND t3.tag_name eq 'XX'
AND t3.tag_name != 'YY'
LIMIT 100

This naturally does not work, since the WHERE does not eliminate an rid having more than one tag. How can we achieve this with performance in mind for the large tables?

More about data:

A user represented by uid will have about 100,000 records in T1, out of which about 10% have T2 records 10,000 (rids which are tagged), and less than 10 tags in T3.
There are 1000s of users (uids) in T1.

A given rid can be one of:

  1. ) Has a single tag –> a single T2 record
  2. ) Has multiple tags –> Multiple t2 records
  3. ) Has no tags –> 0 T2 records

We can also alter the table structure and indices for T2 and T3 to accommodate for that, as long as we maintain ability to filter 'tags' and 'T2' creation time.

Best Answer

You can use NOT EXISTSas:

SELECT t2.rid
FROM t2
WHERE NOT EXISTS (
    SELECT 1 
    FROM t3
    WHERE t2.tag_id = t3.id
      AND t3.tag_name <> 'YY'
)
AND EXISTS (
    SELECT 1 
    FROM t3
    WHERE t2.tag_id = t3.id
      AND t3.tag_name <> 'XX'
);

Your index T2:index(tag_id) is already covered by T2:unique(tag_id,rid) som you can get rid of that

I don't work much with MySQL, but I get the impression that JOINs are often preferred over EXISTS/NOT EXISTS. Translating the query (Note the DISTINCT):

SELECT DISTINCT t2.rid
FROM t2
JOIN t3 AS t31
    ON t2.tag_id = t31.id
   AND t31.tag_name = 'XX'
LEFT JOIN t3 AS t32
    ON t2.tag_id = t32.id
   AND t32.tag_name <> 'YY'
WHERE t32.id IS NULL;