Please provide SHOW CREATE TABLE; the explain is useless without it.
OR is a performance killer in many contexts.
( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR
( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ...
Turn that into
JOIN ( SELECT id FROM p WHERE
( p.pricelist = "name_abc" AND p.iln = "sellerID_123" ) OR
( p.pricelist = "name_def" AND p.iln = "sellerID_456" ) OR ... ) x ON x.id = foo.id
Also needed (on p):
INDEX(pricelist, iln, id)
(With the CREATEs, I could be more specific.)
The idea behind this "trick" is to move the costly work of the OR into a subquery that returns the necessary ids. Plus the INDEX makes it so that it can do all that work in the INDEX.
Your query formatted a bit:
SELECT DATE_FORMAT(dateCreated, '%Y-%m-%d') AS date,
eventPriority,
count(*) AS total
FROM `events`
WHERE customerName="Customer A"
AND processed=1
AND ticketStatus="CREATE"
AND dateCreated >= "2015-07-01"
AND dateCreated <= "2015-10-01"
GROUP BY date, eventPriority
ORDER BY `date` ASC;
The explain shows that the trends
index is used and it is defined as
KEY `trends` (
`customerName`(50),
`processed`,
`ticketStatus`,
`dateCreated`,
`eventName`(128),
`eventPriority`
);
As we can see, the index contains all the columns of your query such a way that we can quickly get all events of "Customer A" which are "processed" and have status "CREATED" and then we can just read the right range for dateCreated
. That seems really good, but as we can see from the EXPLAIN
, there is one additional step after those - for all the index items found to satisfy the conditions it takes the ID
(PRIMARY KEY
is a part of every secondary index in InnoDB) and locates the row in the "main" part of the table. The values of those IDs are quite randomly distributed (even if we expect that autoincrement somehow coincides with the dateCreated
, it is still probabůy not any continuous set, because events of many customers are inside that date range) so that means the query is doing ~140k random lookups by primary key to locate all the rows and build the temp table used for grouping.
The index contains all the columns used in your query (even the eventPriority
) so it looks like it might be covering and this last step not necessary. But because the `customerName`(50)
part, MySQL is afraid that it will not be able to exactly check the customerName
value, because if there were a row with text longer than 50 characters, the index would contain only a part of it. We can see that it is not needed for the "Customer A" as it is quite short - but this optimization is not yet implemented so MySQL just has to get the full row.
When you remove that limit from the index, MySQL can see that it has all the data to resolve your query just by reading the index items (as even eventPriority
is included - and eventName
but that can be skipped). So you save tens of thousands of "random" lookups because the index contains what you need and it is sorted such a way that it is accessible fast.
===
There might be one more possibility for making it even faster. Currently the GROUP BY date
means grouping by result of a function, that has to be computed for every matching row so it is not possible to use index for that. You might create another column in the table for storing the result of DATE_FORMAT(dateCreated, '%Y-%m-%d')
- it might be managed by a trigger, by your application or using a new MySQL feature - generated columns. Then you would just add that to your index (probably the best place would be just before dateCreated
).
But it is probably not needed unless some customer generates really many events for some date range or if you needed to group very long ranges.
Best Answer
Remember how SQL works: construct a set of tuples (rows) based on the given tables and joins, then filter those based on any where clause.
Because you have a where condition on the "right" table, your outer join is effectively converted into an inner join.
To retain the "outer-ness" of the join, put the filter condition into the join construct, rather than in the where clause: