Sql-server – What’s the best way to iterate through the table to update a column in each row

sql serversql-server-2012update

I have a table containing some 3.2 billion rows. It has become necessary to add in a column titled [Daylight] with a datatype of bit. I ended up writing a statement to iterate through each station, for each day of the year. This is taking so insanely long, that I'm now exploring other options, along with the added benefit of (hopefully) learning something along the way. I have toyed with the idea of creating a secondary table that would hold the values for sunrise and sunset for each day of the year, but that would also be a large table (over 30,000 stations, with a range of 1-50+ years of data collection).

Is there a better way to go about this task? I'm really just looking for a finger pointed in the right direction as it's been over 10 years since I've used SQL. In the near future, there will be additional tables created to store various averages of the data in which day/night will become important. While I know math should be done by the front-end or at query time, this is all static data that will not be changing, so I figured it would be best to store it rather than compute it many times.

Below is the script I'm currently running to update the column (assuming it's even actually working, as the table is now locked).

Thank you in advance for any help or suggestions!

declare @station int
declare @maxstation int
declare @lat float
declare @long float
declare @month char(2)
declare @maxmonth char(2)
declare @day char(2)
declare @maxday char(2)
declare @year char(4)
declare @maxyear char(4)
declare @time char(4)
declare @date char(50)
declare @sunrise datetime
declare @sunset datetime
declare @offset int

create table #station_sunrise(Sunrise datetime)
create table #station_sunset(Sunset datetime)

set @station = (select min(StationIncrementalID)
                from station_data)

set @maxstation = (select max(StationIncrementalID)
                    from station_data)

while @station <= @maxstation

begin

    set @lat = (select cast(Latitude as float) from station_data
                    where StationIncrementalID = @station) 

    set @long = (select cast(Longitude as float) from station_data
                    where StationIncrementalID = @station)

    if (@lat is not null and @long is not null)

    begin

        set @year = (select min([Year]) from table
                        where Station = @station)

        set @maxyear = (select max([Year]) from table
                        where Station = @station)

        if (@year <= @maxyear and exists (select [Year] from table
                                            where Station = @station and
                                            [Year] = @year)
                                            )

        begin

            set @month = (select min([Month]) from table
                            where Station = @station and
                            [Year] = @year)

            set @maxmonth = (select max([Month]) from table
                            where Station = @station and
                            [Year] = @year)

            if (@month <= @maxmonth and exists (select [Year], [Month] from table
                                                    where Station = @station and
                                                    [Year] = @year and
                                                    [Month] = @month)
                                                    )

            begin

                set @day = (select min([Day]) from table
                                where Station = @station and
                                [Year] = @year and
                                [Month] = @month)

                set @maxday = (select max([Day]) from table
                                where Station = @station and
                                [Year] = @year and
                                [Month] = @month)

                if (@day <= @maxday and exists (select [Year], [Month], [Day] from table
                                                    where Station = @station and
                                                    [Year] = @year and
                                                    [Month] = @month and
                                                    [Day] = @day)
                                                    )

                begin

                    delete from #station_sunrise
                    delete from #station_sunset

                    set @date = concat(@month,'/',@day,'/',@year)

                    insert into #station_sunrise
                        exec DetermineSunrise @date, @lat, @long, @offset
                    insert into #station_sunset
                        exec DetermineSunset @date, @lat, @long, @offset

                    set @sunrise = (select replace(convert(char(4),(select top 1 [Sunrise] from #station_sunrise),108), ':', ''))
                    set @sunset = (select replace(convert(char(4),(select top 1 [Sunset] from #station_sunset),108), ':', ''))

                    update table
                        set [Daylight] = case
                                            when table.Station = @station and
                                                table.[Year] = @year and
                                                table.[Month] = @month and
                                                table.[Day] = @day and
                                                table.[Time] between @sunrise and @sunset
                                                then 1
                                            when table.Station = @station and
                                                table.[Year] = @year and
                                                table.[Month] = @month and
                                                table.[Day] = @day and
                                                (table.[Time] < @sunrise or
                                                    table.[Time] > @sunset)
                                                then 0
                                            else null
                                            end

                    set @day = @day + 1

                end

                else

                begin

                    set @day = @day + 1

                end

                set @month = @month + 1

            end

            else

            begin

                set @month = @month + 1

            end

            set @year = @year + 1

        end

        else

        begin

            set @year = @year + 1

        end

        set @station = @station + 1

    end

    else

    begin

        set @station = @station + 1

    end

end

Edited for requested information. Thank you @JoeObbish.

CREATE TABLE [dbo].[weather_data](
    [Station] [int] NOT NULL,
    [Year] [char](4) NOT NULL,
    [Month] [char](2) NOT NULL,
    [Day] [char](2) NOT NULL,
    [Time] [char](4) NOT NULL,
    [Daylight] [bit] NULL,
    [WindDirection] [smallint] NULL,
    [WindSpeed] [smallint] NULL,
    [WindGust] [smallint] NULL,
    [CloudCeiling] [smallint] NULL,
    [SkyCover] [char](3) NULL,
    [LowCloudType] [smallint] NULL,
    [MiddleCloudType] [smallint] NULL,
    [HighCloudType] [smallint] NULL,
    [Visibility] [decimal](5, 2) NULL,
    [ManuallyObserved1] [char](2) NULL,
    [ManuallyObserved2] [char](2) NULL,
    [ManuallyObserved3] [char](2) NULL,
    [ManuallyObserved4] [char](2) NULL,
    [AutoObserved1] [char](2) NULL,
    [AutoObserved2] [char](2) NULL,
    [AutoObserved3] [char](2) NULL,
    [AutoObserved4] [char](2) NULL,
    [PastWeatherIndicator] [char](3) NULL,
    [Temperature] [decimal](5, 2) NULL,
    [DewPoint] [decimal](8, 2) NULL,
    [SeaLevelPressure] [decimal](6, 2) NULL,
    [Altimeter] [decimal](6, 2) NULL,
    [StationPressure] [decimal](6, 2) NULL,
    [MaxTemp] [smallint] NULL,
    [MinTemp] [smallint] NULL,
    [Precip1Hour] [decimal](6, 2) NULL,
    [1HourTrace] [bit] NULL,
    [Precip6Hour] [decimal](6, 2) NULL,
    [6HourTrace] [bit] NULL,
    [Precip24Hour] [decimal](6, 2) NULL,
    [24HourTrace] [bit] NULL,
    [PrecipOther] [decimal](6, 2) NULL,
    [OtherTrace] [bit] NULL,
    [SnowDepth] [smallint] NULL,
    [SnowTrace] [bit] NULL
) ON [weather_data]
GO

ALTER TABLE dbo.weather_data ADD CONSTRAINT PK_weather_data PRIMARY KEY NONCLUSTERED ([Station], [Year], [Month], [Day], [Time]) WITH
( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_AutomaticallyObserved1] FOREIGN KEY([AutoObserved1])
REFERENCES [dbo].[automatically_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved1]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_AutomaticallyObserved2] FOREIGN KEY([AutoObserved2])
REFERENCES [dbo].[automatically_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved2]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_AutomaticallyObserved3] FOREIGN KEY([AutoObserved3])
REFERENCES [dbo].[automatically_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved3]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_AutomaticallyObserved4] FOREIGN KEY([AutoObserved4])
REFERENCES [dbo].[automatically_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved4]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_HighCloudType] FOREIGN KEY([HighCloudType])
REFERENCES [dbo].[high_cloud_type] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_HighCloudType]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_LowCloudType] FOREIGN KEY([LowCloudType])
REFERENCES [dbo].[low_cloud_type] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_LowCloudType]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_ManuallyObserved1] FOREIGN KEY([ManuallyObserved1])
REFERENCES [dbo].[manually_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved1]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_ManuallyObserved2] FOREIGN KEY([ManuallyObserved2])
REFERENCES [dbo].[manually_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved2]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_ManuallyObserved3] FOREIGN KEY([ManuallyObserved3])
REFERENCES [dbo].[manually_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved3]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_ManuallyObserved4] FOREIGN KEY([ManuallyObserved4])
REFERENCES [dbo].[manually_observed] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved4]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_MiddleCloudType] FOREIGN KEY([MiddleCloudType])
REFERENCES [dbo].[middle_cloud_type] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_MiddleCloudType]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_PastWeather] FOREIGN KEY([PastWeatherIndicator])
REFERENCES [dbo].[past_weather] ([ID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_PastWeather]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_SkyCover] FOREIGN KEY([SkyCover])
REFERENCES [dbo].[sky_cover] ([Code])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_SkyCover]
GO

ALTER TABLE [dbo].[weather_data]  WITH CHECK ADD  CONSTRAINT [FK_Station] FOREIGN KEY([Station])
REFERENCES [dbo].[station_data] ([StationIncrementalID])
GO

ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_Station]
GO

Just for full disclosure and in case anyone else is silly enough to make the same mistakes I did.. I ended up changing the stored procedures called in my initial script into UDF's, then wrote another UDF that called those two and performed the various conditional statements I was looking for. So, rather than the overly verbose script above, I instead run something similar to the following.

update a
set a.Daylight = dbo.udf_DetermineDaylight((concat(a.[Month],'/'[Day],'/',a.
     [Year])), a.[Time], b.[Latitude], b.[Longitude], 
     dbo.udf_getUTCOffSet())
from test a
left join station_data b
on a.Station = b.StationIncrementalID

This was ran on a testing table that contained 10 million rows without any keys or indexes and was able to successfully (and accurately) update the field in each row in 34 minutes and 59 seconds. Much better performance than roughly 8 hours for 300k rows.

Again, thank you for the advise and the guidance. It is truly appreciated.

Best Answer

Your code won't work as is. You're missing a lot of WHILE loops. Also, your UPDATE query will update every row in the table. Not only will that take a long time, but you'd be left with wrong results at the end of it even if the rest of your code was fixed.

I'm not aware of any easy answers for updating a column in an unpartitioned heap with billions of rows, but my advice to you is as follows:

  1. Test your code until you're confident that you have something that works.
  2. Have your code process a single station. Picking the station with the most rows is a good choice.
  3. Measure how long that code takes and multiply it by the number of stations. Is that within your maintenance window? If so, great, run the full code and be done with it.
  4. If your code is too slow, measure each part to figure out where the bottleneck is, look at execution plans, and try to figure out where changes need to be made.

You'll get much better help if you can identify a specific part of the code that you need help with. Good luck.