Mysql – Different results returned when ignoring Primary Key index. Daylight Savings Time related bug

MySQL

I'm hoping you can help explain this behavior or try to reproduce the issue so I can be confident filing a bug report.

Essentially, I get different results from these queries:

# returns 0
select count(*) as COUNT_WITH_INDEX
from a
where id = 1  and begin_time='2018-11-04 01:01:00.000';

# returns 1
select count(*) as COUNT_WITHOUT_INDEX
from a ignore index (PRIMARY)
where id = 1  and begin_time='2018-11-04 01:01:00.000';

The key difference is the use of ignore index (PRIMARY).

In case the date didn't stick out to you immediately, that date falls in the "Fall Behind" hour of Daylight Savings transition for 'US/Central' timezone. 1:01am occurred twice on 2018-11-04. I only found issues with timestamps that fall in this window, so I suspect its a bug with how DST rules are applied.

Regardless of whether or not I need to use convert_tz() to correctly get the date I want, there's still the fact that I get different results with and without the PRIMARY key index.

Full test case:

create database if not exists test_dt;
use test_dt;

drop table if exists a;

CREATE TABLE `a` (
  `id` int(11) NOT NULL,
  `begin_time` timestamp(3) NOT NULL DEFAULT '0000-00-00 00:00:00.000',
  PRIMARY KEY (`id`,`begin_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;


set TIME_ZONE='+0:00';

insert into a values(1, '2018-11-04 06:01:00.000');
insert into a values(1, '2018-11-04 07:01:00.000');

set TIME_ZONE='US/Eastern';

select * from a;

# returns 0
select count(*) as COUNT_WITH_INDEX
from a
where id = 1  and begin_time='2018-11-04 01:01:00.000';

# returns 1
select count(*) as COUNT_WITHOUT_INDEX
from a ignore index (PRIMARY)
where id = 1  and begin_time='2018-11-04 01:01:00.000';

set TIME_ZONE='US/Central';
# repeat w/ Central if you like

Side note: Install Time Zone data to use named time zones.

Tested on:: CentOS 7 w/ 5.6.36 and 5.6.43. I don't have 5.7 or 8.0 installations handy.

Another side note: The initial problem I encountered was that joins between parent-child tables were not returning data with composite PK (id, timestamp). Since timestamps are stored in UTC format, I wouldn't think DST dates would be an issue, but here I am.

Do you have an explanation for the behavior?
Do you think it's a bug?

Thanks!

Edit
Additional info per comments

What result do you get if you drop the p.k?

set TIME_ZONE='+0:00';

insert into a values(1, '2018-11-04 06:01:00.000');
insert into a values(1, '2018-11-04 07:01:00.000');


select * from a ;   

select 'Set time_zone=US/Central' as msg;
set TIME_ZONE='US/Central';
+------------------+
| COUNT_WITH_INDEX |
+------------------+
|                1 |
+------------------+
1 row in set (0.00 sec)

drop index `primary` on a;

select count(*) as COUNT_DROPPED_PK
from a
where id = 1  and begin_time='2018-11-04 01:01:00.000';


+------------------+
| COUNT_DROPPED_PK |
+------------------+
|                2 |
+------------------+
1 row in set (0.00 sec)

set TIME_ZONE='+0:00';

select * from a;

+----+-------------------------+
| id | begin_time              |
+----+-------------------------+
|  1 | 2018-11-04 06:01:00.000 |
|  1 | 2018-11-04 06:01:00.000 |
+----+-------------------------+

Notice If I change the TZ to US/Central, drop the PK, then set the TZ back to '+0:00', I can see that the data is screwed up. Both records are 06:01 time, when one was inserted 07:01. That behavior is not logical to me since the timestamps should always be UTC in the backend.

1) Show the output of set TIME_ZONE='US/Eastern'; select * from a; 2) What if set TIME_ZONE='-4:00';?

set TIME_ZONE='US/Eastern';
select * from a;
+----+-------------------------+
| id | begin_time              |
+----+-------------------------+
|  1 | 2018-11-04 01:01:00.000 |
|  1 | 2018-11-04 02:01:00.000 |
+----+-------------------------+

set TIME_ZONE='-4:00';
select * from a;
+----+-------------------------+
| id | begin_time              |
+----+-------------------------+
|  1 | 2018-11-04 02:01:00.000 |
|  1 | 2018-11-04 03:01:00.000 |
+----+-------------------------+

Best Answer

I've got an answer that I don't like :

Indexed searches convert the "value or range to be matched " into a UTC time to be compared against the indexed value.

Non-indexed searches convert the "stored value" from UTC to the session time zone for comparison.

That explains why I get different results with and without an index. To me it feels like the implementation is broken, but I don't know what I would do for an easy fix. /shrug

Fortunately we can work around the issue in this case, but we are reconsidering how to handle this in the future.