I have a table that stores available appointments for teachers, allowing two kinds of insertions:
-
Hourly based: with total freedom to add unlimited slots per day per teacher (as long as slots don't overlap): on 15/Apr a teacher may have slots at 10:00, 11:00, 12:00 and 16:00. A person is served after choosing a specific teacher time/slot.
-
Time period/range: on 15/Apr another teacher may work from 10:00 to 12:00 and then from 14:00 to 18:00. A person is served by order of arrival, so if a teacher works from 10:00 to 12:00, all persons that arrive in this period will be attended by order of arrival (local queue).
Since I have to return all available teachers in a search, I need all slots to be saved in the same table as the order of arrival ranges. This way I can order by date_from ASC, showing the first available slots first on the search results.
Current table structure
CREATE TABLE `teacher_slots` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`teacher_id` mediumint(8) unsigned NOT NULL,
`city_id` smallint(5) unsigned NOT NULL,
`subject_id` smallint(5) unsigned NOT NULL,
`date_from` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`date_to` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`status` tinyint(4) NOT NULL DEFAULT '0',
`order_of_arrival` tinyint(1) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `by_hour_idx` (`teacher_id`,`order_of_arrival`,`status`,`city_id`,`subject_id`,`date_from`),
KEY `order_arrival_idx` (`order_of_arrival`,`status`,`city_id`,`subject_id`,`date_from`,`date_to`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
Search query
I need to filter by: actual datetime, city_id, subject_id and if a slot is available (status = 0).
For hourly based I have to show all available slots for the first closest available day for every teacher (show all time slots of a given day and can't show more than one day for the same teacher). (I got the query with the help from mattedgod).
For range based (order_of_arrival = 1), I have to show the closest available range, just one time per teacher.
First query runs individually in around 0.10 ms, second query 0.08 ms and the UNION ALL an average of 300ms.
(
SELECT id, teacher_slots.teacher_id, date_from, date_to, order_of_arrival
FROM teacher_slots
JOIN (
SELECT DATE(MIN(date_from)) as closestDay, teacher_id
FROM teacher_slots
WHERE date_from >= '2014-04-10 08:00:00' AND order_of_arrival = 0
AND status = 0 AND city_id = 6015 AND subject_id = 1
GROUP BY teacher_id
) a ON a.teacher_id = teacher_slots.teacher_id
AND DATE(teacher_slots.date_from) = closestDay
WHERE teacher_slots.date_from >= '2014-04-10 08:00:00'
AND teacher_slots.order_of_arrival = 0
AND teacher_slots.status = 0
AND teacher_slots.city_id = 6015
AND teacher_slots.subject_id = 1
)
UNION ALL
(
SELECT id, teacher_id, date_from, date_to, order_of_arrival
FROM teacher_slots
WHERE order_of_arrival = 1 AND status = 0 AND city_id = 6015 AND subject_id = 1
AND (
(date_from <= '2014-04-10 08:00:00' AND date_to >= '2014-04-10 08:00:00')
OR (date_from >= '2014-04-10 08:00:00')
)
GROUP BY teacher_id
)
ORDER BY date_from ASC;
Question
Is there a way to optimise the UNION, so I can get a reasonable response of a maximum ~20ms or even return range based + hourly based in just one query (with an IF, etc)?
SQL Fiddle: http://www.sqlfiddle.com/#!2/59420/1/0
EDIT:
I tried some denormalization by creating a field "only_date_from" where I stored only the date, so I could change this…
DATE(MIN(date_from)) as closestDay / DATE(teacher_slots.date_from) = closestDay
… to this
MIN(only_date_from) as closestDay / teacher_slots.only_date_from = closestDay
It already saved me 100ms! Still 200ms on average.
Best Answer
Firstly, I think your original query may not be "correct"; With reference to your SQLFiddle, it looks to me as though you should be returning rows with
ID
=2
,3
and4
(in addition to the row withID
=1
you are getting from this half), because your existing logic appears as though you intended for these other rows to be included, as they explicitly meet theOR (date_from >= '2014-04-10 08:00:00')
part of your secondWHERE
clause.The
GROUP BY teacher_id
clause in your second part of yourUNION
is causing you to lose those rows. This is because you're not actually aggregating any columns in your select list, and in this case theGROUP BY
will cause 'difficult to define' behaviour.Also, while I can't explain the poor performance of your
UNION
, I can work around it for you by outright removing it from your query:Rather than using two separate (and in parts, repeating) sets of logic to get rows from the same table, I've consolidated your logic into one query with the differences in your logic
OR
ed together - ie if a row meets one or the other of your originalWHERE
clauses, it's included. This is possible because I've replaced the(INNER) JOIN
you were using to find theclosestDate
with aLEFT JOIN
.This
LEFT JOIN
means we are now also able to distinguish which set of logic should be applied to a row; If the join works (closestDate IS NOT NULL) we apply your logic from the first half, but if the join fails (closestDate IS NULL) then we apply the logic from your second half.So this will return all the rows that your query returned (in the fiddle), and it's also picking up those additional ones.
Further, you can "tidy up" your query further so that you don't need to "plug in" your
status
,city_id
andsubject_id
parameters more than once.To do this, change the subquery
a
to also select those columns, and to also group on those columns. Then, theJOIN
'sON
clause would need to map those columns to theirts.xxx
equivalents.I don't think this will negatively effect performance, but couldn't be sure without testing on a large dataset.
So your join will look more like: