MySQL Query optimisation advice

MySQLoptimizationperformancequery-performance

I'm running the below query and wanted to ask about how i could optimise this query. This query has been in use where i work for a while and i have the feeling that it could be optimised as its run regularly with different parameters for reporting purposes

First run takes around 2 minutes. Subsequent runs take around 30s as the query cache has kicked in i believe.

The tags table has over 15million rows and contains tags for users as well as other tables like companies.

enter image description here

CREATE TABLE `tags` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`type` enum('company','review','job','sector','user') NOT NULL DEFAULT 'company',
`fk_id` int(11) DEFAULT '0',
`tag_key` varchar(24) DEFAULT NULL,
`tag_value` varchar(128) DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `tag_key_value` (`tag_key`,`tag_value`) USING BTREE,
KEY `fk_id_key` (`fk_id`,`tag_key`) USING BTREE,
FULLTEXT KEY `tag_value` (`tag_value`)
) ENGINE=InnoDB AUTO_INCREMENT=15725767 DEFAULT CHARSET=utf8

The users table has around 450k rows.

    CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) DEFAULT NULL,
`title` int(1) NOT NULL,
`firstname` varchar(50) NOT NULL,
`surname` varchar(50) NOT NULL,
`email` varchar(255) NOT NULL,
`password` varchar(255) DEFAULT NULL,
`university` int(11) NOT NULL DEFAULT '0',
`university_start_year` varchar(20) NOT NULL,
`university_end_year` varchar(20) NOT NULL,
`degree_discipline` varchar(3) DEFAULT NULL,
`signup_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`newsletter_subscription` int(1) NOT NULL DEFAULT '1',
...
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `university` (`university`) USING BTREE,
KEY `university_start_year` (`university_start_year`) USING BTREE,
KEY `university_end_year` (`university_end_year`) USING BTREE,
KEY `degree_discipline` (`degree_discipline`) USING BTREE,
..
KEY `email` (`email`),
FULLTEXT KEY `firstname` (`firstname`,`surname`,`comments`,`email`)
) ENGINE=InnoDB AUTO_INCREMENT=1188075 DEFAULT CHARSET=utf8

I don't have the ability to change the table structure or indexes so any input on the optimising the query would be greatly appreciated. I thought that maybe using sub queries or optimising how the joins work would be the first point to address but i'm a bit lost of how i would tackle this.

SELECT u.firstname, u.email
FROM users AS u
LEFT JOIN tags as tags ON tags.fk_id = u.id AND tags.type = "user" AND tags.tag_key = "option"
LEFT JOIN tags as tags6 ON tags6.fk_id = u.id AND tags6.type = "user" AND tags6.tag_key = "role"
LEFT JOIN tags as tags7 ON tags7.fk_id = u.id AND tags7.type = "user" AND tags7.tag_key = "sector"
WHERE (user_type IN ("1")) AND 
(tags.tag_key = "option" AND tags.tag_value IN ("1", "2", "8")) AND 
((degree_discipline IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))
OR (tags6.tag_key = "role" AND tags6.tag_value IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))
OR (tags7.tag_key = "sector" AND tags7.tag_value IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))) AND 
(u.signup_date >= "2012-09-01" ) AND 
newsletter_subscription = "1" 
GROUP BY u.email

Explain

enter image description here

If there's any other information that i can provide that would be helpful let me know.

Server Environment

These queries are being run on a replication slave for reporting purposes. This replication server has hourly backups run on it via mysqldump which could be affecting the caches but thats only a guess at this point.

Best Answer

EAV (key-value) schema is clumsy and hard to optimize.

Please provide SHOW CREATE TABLE for each table.

What table(s) are newsletter_subscription and degree_discipline in?

These should help:

tags:   INDEX(fk_id, type, tag_key)
users:  INDEX(newsletter_subscription, user_type, signup_date)

The Query cache, if invoked, will take only milliseconds, not 30s. It sounds like other I/O caching kicked in.

Also, try this... Try 3 queries, each with one of the OR clauses. While doing it, see if you can simplify each. For example, when fetching tag6, there is no need to join to tag7. If they add it up to less than 2 minutes (or 30 sec), we can discuss using UNION to finish out the query.

A few more hints...

( SELECT u.firstname, u.email
    FROM users AS u
    LEFT JOIN tags as tags ON tags.fk_id = u.id
                          AND tags.type = "user"
                          AND tags.tag_key = "option"
    LEFT JOIN tags as tags6 ON tags6.fk_id = u.id
             AND tags6.type = "user" AND tags6.tag_key = "role"
    LEFT JOIN tags as tags7 ON tags7.fk_id = u.id
              AND tags7.type = "user" AND tags7.tag_key = "sector"
    WHERE (user_type IN ("1"))
      AND (tags.tag_key = "option" AND tags.tag_value IN ("1", "2", "8"))
      AND ((degree_discipline IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))
) UNION DISTINCT
( SELECT ... same ...
    FROM ... same ...
    WHERE ... second part of OR ...
) UNION DISTINCT
( SELECT ... 3rd ...
) GROUP BY u.email