I'm running the below query and wanted to ask about how i could optimise this query. This query has been in use where i work for a while and i have the feeling that it could be optimised as its run regularly with different parameters for reporting purposes
First run takes around 2 minutes. Subsequent runs take around 30s as the query cache has kicked in i believe.
The tags
table has over 15million rows and contains tags for users
as well as other tables like companies
.
CREATE TABLE `tags` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`type` enum('company','review','job','sector','user') NOT NULL DEFAULT 'company',
`fk_id` int(11) DEFAULT '0',
`tag_key` varchar(24) DEFAULT NULL,
`tag_value` varchar(128) DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
KEY `tag_key_value` (`tag_key`,`tag_value`) USING BTREE,
KEY `fk_id_key` (`fk_id`,`tag_key`) USING BTREE,
FULLTEXT KEY `tag_value` (`tag_value`)
) ENGINE=InnoDB AUTO_INCREMENT=15725767 DEFAULT CHARSET=utf8
The users
table has around 450k rows.
CREATE TABLE `users` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(30) DEFAULT NULL,
`title` int(1) NOT NULL,
`firstname` varchar(50) NOT NULL,
`surname` varchar(50) NOT NULL,
`email` varchar(255) NOT NULL,
`password` varchar(255) DEFAULT NULL,
`university` int(11) NOT NULL DEFAULT '0',
`university_start_year` varchar(20) NOT NULL,
`university_end_year` varchar(20) NOT NULL,
`degree_discipline` varchar(3) DEFAULT NULL,
`signup_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`newsletter_subscription` int(1) NOT NULL DEFAULT '1',
...
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
PRIMARY KEY (`id`),
KEY `university` (`university`) USING BTREE,
KEY `university_start_year` (`university_start_year`) USING BTREE,
KEY `university_end_year` (`university_end_year`) USING BTREE,
KEY `degree_discipline` (`degree_discipline`) USING BTREE,
..
KEY `email` (`email`),
FULLTEXT KEY `firstname` (`firstname`,`surname`,`comments`,`email`)
) ENGINE=InnoDB AUTO_INCREMENT=1188075 DEFAULT CHARSET=utf8
I don't have the ability to change the table structure or indexes so any input on the optimising the query would be greatly appreciated. I thought that maybe using sub queries or optimising how the joins work would be the first point to address but i'm a bit lost of how i would tackle this.
SELECT u.firstname, u.email
FROM users AS u
LEFT JOIN tags as tags ON tags.fk_id = u.id AND tags.type = "user" AND tags.tag_key = "option"
LEFT JOIN tags as tags6 ON tags6.fk_id = u.id AND tags6.type = "user" AND tags6.tag_key = "role"
LEFT JOIN tags as tags7 ON tags7.fk_id = u.id AND tags7.type = "user" AND tags7.tag_key = "sector"
WHERE (user_type IN ("1")) AND
(tags.tag_key = "option" AND tags.tag_value IN ("1", "2", "8")) AND
((degree_discipline IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))
OR (tags6.tag_key = "role" AND tags6.tag_value IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))
OR (tags7.tag_key = "sector" AND tags7.tag_value IN ("ACC", "ACT", "ACF", "INS", "CBN", "INV", "RBN", "BNK", "ECO", "FIN", "MAT"))) AND
(u.signup_date >= "2012-09-01" ) AND
newsletter_subscription = "1"
GROUP BY u.email
Explain
If there's any other information that i can provide that would be helpful let me know.
Server Environment
These queries are being run on a replication slave for reporting purposes. This replication server has hourly backups run on it via mysqldump
which could be affecting the caches but thats only a guess at this point.
Best Answer
EAV (key-value) schema is clumsy and hard to optimize.
Please provide
SHOW CREATE TABLE
for each table.What table(s) are
newsletter_subscription
anddegree_discipline
in?These should help:
The Query cache, if invoked, will take only milliseconds, not 30s. It sounds like other I/O caching kicked in.
Also, try this... Try 3 queries, each with one of the
OR
clauses. While doing it, see if you can simplify each. For example, when fetching tag6, there is no need to join to tag7. If they add it up to less than 2 minutes (or 30 sec), we can discuss usingUNION
to finish out the query.A few more hints...