I have a table like this:
CREATE TABLE IF NOT EXISTS `jobs` (
`job_id` varchar(36) NOT NULL,
`job_status` varchar(30) NOT NULL,
`created_at` datetime NOT NULL,
`lease_date` datetime,
`priority` int NOT NULL,
PRIMARY KEY(`job_id`),
INDEX `job_status_priority_lease_date` (`job_status`, `priority`, `lease_date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
The full query I want to perform on the table would be the following:
select * from jobs
where job_status="IN_PROGRESS"
and lease_date<"10minutesago"
order by
priority
limit 100
I am implementing database as a job queue. 10 min is like the job time out, I only want to act on the jobs that are already timed out. Jobs has different priority so I was ordering by it. Also, I want to make my DB job queue behave like FIFO, that's why my index includes lease_date
.
I definitely want my results to be ordered by Priority, e.g. two chunks of the results with Priority 0 and 1. I don't need my results to be strictly ordered by lease_date
, as long as they are Ascending order in their own chunks that is good enough for me. e.g. result set:
priority lease_date
0 2017-08-19
0 2018-09-20
1 2016-08-20
1 2018-10-20
Is my index enough for that? How much can I optimize the query and build an index to boost the performance as much as possible?
Best Answer
By
"10minutesago"
do you meanNOW() - INTERVAL 10 MINUTE
?order by priority DESC
sorts only onpriority
. The ordering within it is unpredictable. It might be ascending, it might be descending, or seemingly random.Your
INDEX(job_status, priority, lease_date)
won't get as far aslease_date
. It will take care ofwhere job_status="IN_PROGRESS"
and it might then walk through the rows inpriority
order. But that's all.You could say
ORDER BY priority DESC, lease_date DESC
, butand lease_date < "10minutesago"
will not make use of the index.Think of building an index thus: First have any number of columns tested with
=
. Then have one range column.(After OP's edit)
Your example implies that you need
As for the question of efficiency, ...
The
WHERE
clause needsIf the optimizer would prefer to focus on the
ORDER BY
, thenNotes:
job_status
is first because it is tested with=
.lease_date
makes it possible to handle the entireWHERE
in the index. But there will still be an "filesort" to handle theORDER BY
.ORDER BY
with the second index (to avoid the filesort) may or may not backfire. It depends on how much of the data needs to be skipped over because of thelease_date
filter.If you have a thousand rows in the table, simply add both indexes; the query will be "fast enough". If you have a million rows, then you are stuck with a tough problem and should rethink the queuing mechanism. No matter what you do with the indexes, occasionally the wrong index will be used and the query will run painfully slow. Be aware that a hiccup in the system can make a queue sporadically jump from a civilized number of entries to a huge number.
Do you imply that this table is a "queue"? How long does it take to perform an item in the queue? If it is a "short" time, then "Don't queue it, just do it".
Add
priority
to WHEREIf you change to
then either of these becomes optimal:
See my Cookbook for discussion.
If you have a mixture of with and without
priority
, then have these two:The Optimizer will pick between them.