I have a table containing some 3.2 billion rows. It has become necessary to add in a column titled [Daylight] with a datatype of bit. I ended up writing a statement to iterate through each station, for each day of the year. This is taking so insanely long, that I'm now exploring other options, along with the added benefit of (hopefully) learning something along the way. I have toyed with the idea of creating a secondary table that would hold the values for sunrise and sunset for each day of the year, but that would also be a large table (over 30,000 stations, with a range of 1-50+ years of data collection).
Is there a better way to go about this task? I'm really just looking for a finger pointed in the right direction as it's been over 10 years since I've used SQL. In the near future, there will be additional tables created to store various averages of the data in which day/night will become important. While I know math should be done by the front-end or at query time, this is all static data that will not be changing, so I figured it would be best to store it rather than compute it many times.
Below is the script I'm currently running to update the column (assuming it's even actually working, as the table is now locked).
Thank you in advance for any help or suggestions!
declare @station int
declare @maxstation int
declare @lat float
declare @long float
declare @month char(2)
declare @maxmonth char(2)
declare @day char(2)
declare @maxday char(2)
declare @year char(4)
declare @maxyear char(4)
declare @time char(4)
declare @date char(50)
declare @sunrise datetime
declare @sunset datetime
declare @offset int
create table #station_sunrise(Sunrise datetime)
create table #station_sunset(Sunset datetime)
set @station = (select min(StationIncrementalID)
from station_data)
set @maxstation = (select max(StationIncrementalID)
from station_data)
while @station <= @maxstation
begin
set @lat = (select cast(Latitude as float) from station_data
where StationIncrementalID = @station)
set @long = (select cast(Longitude as float) from station_data
where StationIncrementalID = @station)
if (@lat is not null and @long is not null)
begin
set @year = (select min([Year]) from table
where Station = @station)
set @maxyear = (select max([Year]) from table
where Station = @station)
if (@year <= @maxyear and exists (select [Year] from table
where Station = @station and
[Year] = @year)
)
begin
set @month = (select min([Month]) from table
where Station = @station and
[Year] = @year)
set @maxmonth = (select max([Month]) from table
where Station = @station and
[Year] = @year)
if (@month <= @maxmonth and exists (select [Year], [Month] from table
where Station = @station and
[Year] = @year and
[Month] = @month)
)
begin
set @day = (select min([Day]) from table
where Station = @station and
[Year] = @year and
[Month] = @month)
set @maxday = (select max([Day]) from table
where Station = @station and
[Year] = @year and
[Month] = @month)
if (@day <= @maxday and exists (select [Year], [Month], [Day] from table
where Station = @station and
[Year] = @year and
[Month] = @month and
[Day] = @day)
)
begin
delete from #station_sunrise
delete from #station_sunset
set @date = concat(@month,'/',@day,'/',@year)
insert into #station_sunrise
exec DetermineSunrise @date, @lat, @long, @offset
insert into #station_sunset
exec DetermineSunset @date, @lat, @long, @offset
set @sunrise = (select replace(convert(char(4),(select top 1 [Sunrise] from #station_sunrise),108), ':', ''))
set @sunset = (select replace(convert(char(4),(select top 1 [Sunset] from #station_sunset),108), ':', ''))
update table
set [Daylight] = case
when table.Station = @station and
table.[Year] = @year and
table.[Month] = @month and
table.[Day] = @day and
table.[Time] between @sunrise and @sunset
then 1
when table.Station = @station and
table.[Year] = @year and
table.[Month] = @month and
table.[Day] = @day and
(table.[Time] < @sunrise or
table.[Time] > @sunset)
then 0
else null
end
set @day = @day + 1
end
else
begin
set @day = @day + 1
end
set @month = @month + 1
end
else
begin
set @month = @month + 1
end
set @year = @year + 1
end
else
begin
set @year = @year + 1
end
set @station = @station + 1
end
else
begin
set @station = @station + 1
end
end
Edited for requested information. Thank you @JoeObbish.
CREATE TABLE [dbo].[weather_data](
[Station] [int] NOT NULL,
[Year] [char](4) NOT NULL,
[Month] [char](2) NOT NULL,
[Day] [char](2) NOT NULL,
[Time] [char](4) NOT NULL,
[Daylight] [bit] NULL,
[WindDirection] [smallint] NULL,
[WindSpeed] [smallint] NULL,
[WindGust] [smallint] NULL,
[CloudCeiling] [smallint] NULL,
[SkyCover] [char](3) NULL,
[LowCloudType] [smallint] NULL,
[MiddleCloudType] [smallint] NULL,
[HighCloudType] [smallint] NULL,
[Visibility] [decimal](5, 2) NULL,
[ManuallyObserved1] [char](2) NULL,
[ManuallyObserved2] [char](2) NULL,
[ManuallyObserved3] [char](2) NULL,
[ManuallyObserved4] [char](2) NULL,
[AutoObserved1] [char](2) NULL,
[AutoObserved2] [char](2) NULL,
[AutoObserved3] [char](2) NULL,
[AutoObserved4] [char](2) NULL,
[PastWeatherIndicator] [char](3) NULL,
[Temperature] [decimal](5, 2) NULL,
[DewPoint] [decimal](8, 2) NULL,
[SeaLevelPressure] [decimal](6, 2) NULL,
[Altimeter] [decimal](6, 2) NULL,
[StationPressure] [decimal](6, 2) NULL,
[MaxTemp] [smallint] NULL,
[MinTemp] [smallint] NULL,
[Precip1Hour] [decimal](6, 2) NULL,
[1HourTrace] [bit] NULL,
[Precip6Hour] [decimal](6, 2) NULL,
[6HourTrace] [bit] NULL,
[Precip24Hour] [decimal](6, 2) NULL,
[24HourTrace] [bit] NULL,
[PrecipOther] [decimal](6, 2) NULL,
[OtherTrace] [bit] NULL,
[SnowDepth] [smallint] NULL,
[SnowTrace] [bit] NULL
) ON [weather_data]
GO
ALTER TABLE dbo.weather_data ADD CONSTRAINT PK_weather_data PRIMARY KEY NONCLUSTERED ([Station], [Year], [Month], [Day], [Time]) WITH
( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_AutomaticallyObserved1] FOREIGN KEY([AutoObserved1])
REFERENCES [dbo].[automatically_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved1]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_AutomaticallyObserved2] FOREIGN KEY([AutoObserved2])
REFERENCES [dbo].[automatically_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved2]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_AutomaticallyObserved3] FOREIGN KEY([AutoObserved3])
REFERENCES [dbo].[automatically_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved3]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_AutomaticallyObserved4] FOREIGN KEY([AutoObserved4])
REFERENCES [dbo].[automatically_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_AutomaticallyObserved4]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_HighCloudType] FOREIGN KEY([HighCloudType])
REFERENCES [dbo].[high_cloud_type] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_HighCloudType]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_LowCloudType] FOREIGN KEY([LowCloudType])
REFERENCES [dbo].[low_cloud_type] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_LowCloudType]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_ManuallyObserved1] FOREIGN KEY([ManuallyObserved1])
REFERENCES [dbo].[manually_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved1]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_ManuallyObserved2] FOREIGN KEY([ManuallyObserved2])
REFERENCES [dbo].[manually_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved2]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_ManuallyObserved3] FOREIGN KEY([ManuallyObserved3])
REFERENCES [dbo].[manually_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved3]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_ManuallyObserved4] FOREIGN KEY([ManuallyObserved4])
REFERENCES [dbo].[manually_observed] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_ManuallyObserved4]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_MiddleCloudType] FOREIGN KEY([MiddleCloudType])
REFERENCES [dbo].[middle_cloud_type] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_MiddleCloudType]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_PastWeather] FOREIGN KEY([PastWeatherIndicator])
REFERENCES [dbo].[past_weather] ([ID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_PastWeather]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_SkyCover] FOREIGN KEY([SkyCover])
REFERENCES [dbo].[sky_cover] ([Code])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_SkyCover]
GO
ALTER TABLE [dbo].[weather_data] WITH CHECK ADD CONSTRAINT [FK_Station] FOREIGN KEY([Station])
REFERENCES [dbo].[station_data] ([StationIncrementalID])
GO
ALTER TABLE [dbo].[weather_data] CHECK CONSTRAINT [FK_Station]
GO
Just for full disclosure and in case anyone else is silly enough to make the same mistakes I did.. I ended up changing the stored procedures called in my initial script into UDF's, then wrote another UDF that called those two and performed the various conditional statements I was looking for. So, rather than the overly verbose script above, I instead run something similar to the following.
update a
set a.Daylight = dbo.udf_DetermineDaylight((concat(a.[Month],'/'[Day],'/',a.
[Year])), a.[Time], b.[Latitude], b.[Longitude],
dbo.udf_getUTCOffSet())
from test a
left join station_data b
on a.Station = b.StationIncrementalID
This was ran on a testing table that contained 10 million rows without any keys or indexes and was able to successfully (and accurately) update the field in each row in 34 minutes and 59 seconds. Much better performance than roughly 8 hours for 300k rows.
Again, thank you for the advise and the guidance. It is truly appreciated.
Best Answer
Your code won't work as is. You're missing a lot of
WHILE
loops. Also, yourUPDATE
query will update every row in the table. Not only will that take a long time, but you'd be left with wrong results at the end of it even if the rest of your code was fixed.I'm not aware of any easy answers for updating a column in an unpartitioned heap with billions of rows, but my advice to you is as follows:
You'll get much better help if you can identify a specific part of the code that you need help with. Good luck.