SQL Server – How to Update Every Datetime in Bulk

performancesql serversql-server-2008-r2

The current database assumes all times are local to the user. Now we're changing this so the database stores everything in UTC and our application converts the UTC time in the database to the correct timezone for each user.

Now we need to update every datetime in the database to UTC time. Is there a better way than having our application (which has a Timezone library) update each one individually?

Due to daylight savings time (BST), a simple DATEADD isn't an option. There are roughly 10 million cells to update.

Best Answer

Assuming you are dealing with a single time zone at a time (e.g. British Time), and can identify all rows that are in British Time (maybe that's everyone so far), you could create a calendar table that has the DST time change dates for Summer Time. You subtract an hour for all times that fall between, and don't touch the ones that don't. Sample data:

CREATE TABLE dbo.SampleData
(
  LocalDT DATETIME,
  IsBritishTime BIT,
  UTC DATETIME
);

INSERT dbo.SampleData(LocalDT,IsBritishTime) VALUES
 ('20000101 00:00',1), -- should NOT be changed
 ('20000326 00:59',1), -- should NOT be changed
 ('20000326 01:01',1), -- should go back
 ('20001029 01:59',1), -- should go back
 ('20001029 02:01',1), -- should NOT be changed
 ('20001231 23:59',1), -- should NOT be changed
 ('20000401 00:00',0); -- should NOT be changed (different time zone)

Now you can identify the ranges using some rather tricky T-SQL. We grab some rows from system tables based on the years your data needs to cover, then for each year we calculate the beginning and end of BST. We can then use those outputs to update the main table:

SET DATEFIRST 7;

;WITH y AS 
(
  -- all the years from 2000 through 50 years after the current year:
  SELECT TOP (YEAR(GETDATE())-2000+51) 
    y = DATEADD(DAY,-1,DATEADD(YEAR,number,'20000101'))
      FROM [master].dbo.spt_values 
      WHERE [type] = N'P' ORDER BY number
),
s AS
(
  SELECT 
    -- BST starts last Sunday in March @ 1:00 AM UTC:
    BSTStart = DATEADD(HOUR, 1, DATEADD(DAY,(1-DATEPART(
      WEEKDAY,DATEADD(MONTH,3,y))) % 7,DATEADD(MONTH,3,y))),
    -- and ends last Sunday in October @ 2:00 AM UTC:
    BSTEnd = DATEADD(HOUR,2,DATEADD(DAY,(1-DATEPART(
      WEEKDAY,DATEADD(MONTH,10,y))) % 7,DATEADD(MONTH,10,y)))
  FROM y
)
-- UPDATE d SET 
SELECT *,
  UTC = DATEADD(HOUR, 
  CASE WHEN s.BSTStart IS NULL THEN 0 ELSE -1 END, d.LocalDT)
FROM dbo.SampleData AS d
LEFT OUTER JOIN s 
ON d.LocalDT >= s.BSTStart
AND d.LocalDT < s.BSTEnd
WHERE d.IsBritishTime = 1 
-- or more likely WHERE EXISTS 
-- (SELECT 1 FROM dbo.Users 
--    WHERE UserID = d.UserID
--    AND TimeZone = 'BritishTime');

As written, this just shows you the join and identifies the rows that will be updated with a different UTC value (they will have non-null BSTStart/BSTEnd values). If you uncomment -- UPDATE d SET and comment SELECT *,, then select the data from dbo.SampleData, you should see this:

LocalDT          IsBritishTime UTC
---------------- ------------- ----------------
2000-01-01 00:00 1             2000-01-01 00:00
2000-03-26 00:59 1             2000-03-26 00:59
2000-03-26 01:01 1             2000-03-26 00:01 -- changed
2000-10-29 01:59 1             2000-10-29 00:59 -- changed
2000-10-29 02:01 1             2000-10-29 02:01
2000-12-31 23:59 1             2000-12-31 23:59
2000-04-01 00:00 0             NULL

(And actually, the March 26 1:01 local time shouldn't be possible - at 1:00 AM, the clocks rolled forward to 2 AM, so how could it ever have been 1:01 AM locally? So, you may want to special case that hour and deal with those cases individually. This is why every system I design stores UTC from the start - there is no plausible reason to store local time, since I can always get that from UTC, but I can't always get back.)

Of course, this will affect all rows that match the WHERE clause, it's just that some of the rows won't have a different value for UTC. I highly recommend putting this into a different column first, rather than just changing the start time, so that you maintain the original data in case something goes wrong, you need to troubleshoot, you do end up needing that data later, etc. You can change the column names if you want to use the UTC value in existing queries instead of changing those one-by-one.

This will work best if you have supporting indexes on the underlying table. If you are performing this update against multiple time zones (and have a column that can identify rows by local time zone), then creating a filtered index for each operation might be something worth trying. If you are identifying by some other indirection (such as UserID), maybe you could do a user at a time, or the set of users from each time zone. Chunking will work best in terms of blocking and log impact, but having no other details about the structure and how you could partition the data, that's the best I can do.

I provided British Time above, since it was the example in the question, but you'll obviously need different calculations for different timezones that have different offsets and different start/end dates (and those may differ from year to year, for example in 2007 the US dates changed).

This is based on the fact that the UK (well, London specifically) moves forward to 2 AM BST on the last Sunday of March at 1 AM local (or UTC), and back to 1 AM GMT the last Sunday of October at 2 AM local (1 AM UTC). Corroborate here; link courtesy @JackDouglas.