MySQL query to evaluate ranges in date and time elements of a datetime column separately

datetimeMySQLselect

TABLE

EventID, EventTime (DATETIME)

1       2013-09-29 23:55:00.0
2       2013-10-01 00:05:00.0
3       2013-09-29 23:55:00.0
4       2013-09-29 23:45:00.0
5       2013-10-02 23:05:00.0
6       2013-09-26 23:50:00.0
7       2013-09-25 23:55:00.0
8       2013-09-24 22:55:00.0
9       2013-09-29 00:10:00.0
10      2013-09-27 23:45:00.0
11      2013-08-23 05:10:00.0
12      2013-08-22 05:15:00.0
13      2013-08-21 17:10:00.0
14      2013-08-22 19:10:00.0
15      2013-08-23 20:05:00.0
16      2013-08-20 10:05:00.0

Scenarios:

S.no#   EventTime       +/-day  +/-time(minutes)    expected rows

1   2013-09-29 23:55    0    0          1,3
2   2013-09-29 23:55    4    0          1,3,7
3   2013-09-29 23:50    0   10          1,3,4
4   2013-09-26 00:00    7   15          1,2,3,4,6,7,9,10
5   2013-09-24 22:55    1   60          7,8
6   2013-09-24 22:55    3   60          6,7,8,10
7   2013-08-22 09:00    0    0          -
8   2013-08-22 09:00    0    15         -
9   2013-08-20 09:00    3    60         - (#11-#16 fall under date range, but not under time range: 9-10)
10  2013-08-22 06:00    0    60         12
11  2013-08-21 05:00    2    15         11,12 (#16 fall under date range, but not under time range: 4:45 - 5:15)
12  2013-08-23 19:30    1    45         14,15
13  2013-08-21 15:00    2   60          - (#20, #21 an #22 fall under date range, but not under time range)

For given date and time, rows must be selected considering +/- day and/or +/- time (both are also user inputs). Note:

  1. Day may have range. If 0, same date must be considered.
  2. Time may have range. If 0, same time must be considered.
  3. Time range must be calculated for time component of date and must
    not be included in date range. Example: for scenario # 4, though it
    has 7 days range, but as time must be only around 15 minutes, 5th an
    8th rows were excluded.
  4. Days could be maximum +/- 7 days and time could be maximum +/- 180
    minutes.
  5. Seconds are always zero.

'Scenarios' is provided for explanation only. My requirement is to fetch only those records from EventTable which satisfy input for EventTime, day and time value. If day and time (can be positive number only) is non-zero value, query should return +/- of that value for EventTime, else for same day and time as EventTime.

I could narrow rows by:

SELECT * FROM EventTable 
WHERE EventTime BETWEEN StartEventTime AND EndEventTime
AND TIME( EventTime ) = TIME( StartEventTime )

StartEventTime and EndEventTime in YYYY-MM-DD HH:MM:SS format.

But, this matches exact time, I'm unble to +/- time.

Please help frame SQL statement. Thanks!

Best Answer

This actually turned out to be fairly painful. :) The details of this logic ran though my head most of last night... but in my current sleep-deprived state, I think I have devised a solution where every case is handled, and which does what we need in a reasonably-optimal fashion.

It returns the correct values for all of the test cases.

You need an index on (EventTime) so that the optimizer can turn the date-level tests into constants and use that index to find the matching rows by date range... then the server can filter the date-matched rows based on their time match in the CASE statement of the where clause.

In the WHERE clause, the CASE expression matches a condition in our data, and will yield a boolean value based on the truthiness of the expressions in the THEN. This value will include a row if TRUE and exclude the row if FALSE.

I've used variables, but that's mostly for clarity... assuming there's any clarity to be found in what I've written, below, and to prevent errors since the literals are needed in so many different places.

SET @TargetDateTime = '2013-09-26 00:00:00';
SET @DateRange = 7;
SET @TimeRange = 15;

SELECT E.*
 FROM EventTable E
      -- must be at or after midnight the morning of the first valid day
WHERE EventTime >= DATE_SUB(DATE(@TargetDateTime), INTERVAL @DateRange DAY)
      -- must be before midnight on the morning after the first valid day 
  AND EventTime <  DATE_ADD(DATE(@TargetDateTime), INTERVAL (@DateRange + 1) DAY)
      -- now, hold my beer and watch this...
  AND CASE 
  -- easy first case, the time portions of both values are equal
  WHEN TIME(EventTime) = TIME(@TargetDateTime) 
  -- in each case if the expression in the "THEN" evaluates to something true, 
  -- then the row will be returned if the other conditions in WHERE are also met
  THEN TRUE
  -- next case, if the event appears to be earlier than the target, 
  -- we match the row IF either...
  WHEN TIME(EventTime) < TIME(@TargetDateTime)
  -- ...the simple case, difference in seconds is within range
  THEN TIME_TO_SEC(@TargetDateTime) - TIME_TO_SEC(EventTime)
       BETWEEN 0 AND (@TimeRange * 60)
       OR 
  -- ...the complicated case, when we span midnight 
       86400 + TIME_TO_SEC(EventTime) - TIME_TO_SEC(@TargetDateTime) <= (@TimeRange * 60)
  -- next case, event appears later than target, we match if either...
  WHEN TIME(EventTime) > TIME(@TargetDateTime)
  -- ... simple subtraction returns a value in range
  THEN TIME_TO_SEC(EventTime) - TIME_TO_SEC(@TargetDateTime)
       BETWEEN 0 AND (@TimeRange * 60)
       OR
  -- ... we wrap around midnight
      86400 + TIME_TO_SEC(@TargetDateTime) - TIME_TO_SEC(EventTime) <= (@TimeRange * 60)
  -- ELSE isn't encountered unless we run into null values, in which case, no match
  ELSE FALSE END; /* END of CASE expression */

A working demo of the final version is here: http://sqlfiddle.com/#!2/7b8e5/2

It is usually bad form in a WHERE clause to use columns as arguments to functions, because this prevents the optimizer from using an available index, since it has to evaluate the data in every row of the table in order to evaluate the function. There's no harm, here, though, since an index can't help us match the times. This is because DATETIME values are on the left hand side of the field, as stored, and the time values are on the right, and an index stops helping as soon as we encounter irrelevant information from left to right. (That's usually discussed in the context of multi-column indexes, but it also holds true, here.)

It's also often bad form in a WHERE clause to use a CASE expression because that too may prevent the optimizer from choosing the optimal path because the CASE can prevent the optimizer from forming a proper understanding of the logic, but this is acceptable here because we have nothing but AND in the WHERE clause and two other conditions that are easily satisfied by an index.


Update: This code and the answer went through several iterations before arriving at its current (working) form, as you can see in the comments.

Essentially, what we are doing is considering the three cases than can be identified when trying to compare wall-clock time of day: the times are equal, A is earlier than B, or B is earlier than A. It seems like "absolute value" might come into play, but the problem we encounter is that when a time value has no date component, earlier and later aren't meaningful concepts when the times might span midnight. Without a date component, 00:05:00 and 23:55:00 are 23 hours and 50 minutes apart. With a date component, they could also be "n days and 10 minutes" apart. When the date component is missing but uninteresting (or already handled, in this case), we first check the simple case, subtracting the "earlier" from the "later" to see if they're within range, but when they're not, we add 86400 seconds to the "earlier" component, essentially making it "the same time, the following day"... then the subtraction can be done to see if the times are close enough together when they're calculated as having spanned midnight.