The current database assumes all times are local to the user. Now we're changing this so the database stores everything in UTC and our application converts the UTC time in the database to the correct timezone for each user.
Now we need to update every datetime in the database to UTC time. Is there a better way than having our application (which has a Timezone library) update each one individually?
Due to daylight savings time (BST), a simple DATEADD isn't an option. There are roughly 10 million cells to update.
Best Answer
Assuming you are dealing with a single time zone at a time (e.g. British Time), and can identify all rows that are in British Time (maybe that's everyone so far), you could create a calendar table that has the DST time change dates for Summer Time. You subtract an hour for all times that fall between, and don't touch the ones that don't. Sample data:
Now you can identify the ranges using some rather tricky T-SQL. We grab some rows from system tables based on the years your data needs to cover, then for each year we calculate the beginning and end of BST. We can then use those outputs to update the main table:
As written, this just shows you the join and identifies the rows that will be updated with a different UTC value (they will have non-null BSTStart/BSTEnd values). If you uncomment
-- UPDATE d SET
and commentSELECT *,
, then select the data fromdbo.SampleData
, you should see this:(And actually, the March 26 1:01 local time shouldn't be possible - at 1:00 AM, the clocks rolled forward to 2 AM, so how could it ever have been 1:01 AM locally? So, you may want to special case that hour and deal with those cases individually. This is why every system I design stores UTC from the start - there is no plausible reason to store local time, since I can always get that from UTC, but I can't always get back.)
Of course, this will affect all rows that match the
WHERE
clause, it's just that some of the rows won't have a different value forUTC
. I highly recommend putting this into a different column first, rather than just changing the start time, so that you maintain the original data in case something goes wrong, you need to troubleshoot, you do end up needing that data later, etc. You can change the column names if you want to use the UTC value in existing queries instead of changing those one-by-one.This will work best if you have supporting indexes on the underlying table. If you are performing this update against multiple time zones (and have a column that can identify rows by local time zone), then creating a filtered index for each operation might be something worth trying. If you are identifying by some other indirection (such as UserID), maybe you could do a user at a time, or the set of users from each time zone. Chunking will work best in terms of blocking and log impact, but having no other details about the structure and how you could partition the data, that's the best I can do.
I provided British Time above, since it was the example in the question, but you'll obviously need different calculations for different timezones that have different offsets and different start/end dates (and those may differ from year to year, for example in 2007 the US dates changed).
This is based on the fact that the UK (well, London specifically) moves forward to 2 AM BST on the last Sunday of March at 1 AM local (or UTC), and back to 1 AM GMT the last Sunday of October at 2 AM local (1 AM UTC). Corroborate here; link courtesy @JackDouglas.