MySQL – Optimizing Query for Large Number of Rows

indexMySQLperformanceperformance-tuning

I have the following database table in MySQL :

enter image description here

and is running the following query

SELECT `scratch_tickets`.`id` AS ticket_id, `scratch_tickets`.`created_at`, `scratch_tickets`.`prize_type`, `scratch_tickets`.`prize`, `scratch_tickets`.`is_win`, `scratch_tickets`.`game_id` FROM (`scratch_tickets`)  WHERE `scratch_tickets`.`created_at` >='2015-02-12 00:00:00' and created_at<='2015-02-18 23:59:59' AND `scratch_tickets`.`status` = 4 LIMIT 50;

Now the EXPLAIN is giving following results

enter image description here

The query is taking almost 35 seconds to execute and when I use this with PHP, it is getting timed out. Here I think the indexes being utilized. How can I optimize this for a better performance ?

The table already have 25708614 records.

Best Answer

Your query:

SELECT scratch_tickets.id AS ticket_id, 
       -- more columns
FROM (scratch_tickets)  
WHERE scratch_tickets.created_at >='2015-02-12 00:00:00' 
  AND created_at <= '2015-02-18 23:59:59' 
  AND scratch_tickets.status = 4 
LIMIT 50 ;

needs a simple index on (status, created_at):

ALTER TABLE scratch_tickets
  ADD INDEX status_created_at_idx         -- pick a name for the index
  (status, created_at) ;

After testing that the query is using the new index and is more efficient, you can probably drop the old status index. ("probably" because you may have some rare query that profits a lot from a (status) index but is slow when using the (status, created_at) one.)

Some more notes:

  • Having LIMIT without ORDER BY is a recipe for indeterminate results. Always use ORDER BY so you are in control of which (50) rows will be chosen for return (unless you really don't mind semi-random results or there are serious performance issues).
  • Also note that adding or changing the ORDER BY expression may affect performance. For the specific query, the index will be used effectively if you have ORDER BY created_at or ORDER BY created_at DESC or ORDER BY created_at, ticket_id. If a different column or expression is used, the efficiency may not be as good.
  • Instead of inclusive (>= and <= - or BETWEEN) it's easier and less error-prone to use inclusive-exclusive ranges for datetime types. This way, you don't care much if the type is DATE, DATETIME or TIMESTAMP or if it has second, millisecond or microsecond accuracy.
  • Queries are more clear with aliases. See a rewrite:

    SELECT st.id AS ticket_id, 
           -- more columns
    FROM scratch_tickets AS st
    WHERE at.created_at >= '2015-02-12 00:00:00'       -- you could even use '2015-02-12'
      AND st.created_at <  '2015-02-19 00:00:00'       -- notice the   < 
      AND st.status = 4 
    ORDER BY <some_expression>
    LIMIT 50 ;