I have a table which contains raw data for various weather conditions. Each day has multiple data readings, some in 5 minute increments all the way up to hour long increments. I am looking to create a table that holds various averages as well as a bit field for certain conditions.
I found one post that I felt was close to my scenario (Confusing Query: Mixing Aggregates and Non-Aggregates) and both solutions gave various errors.
I suppose my question is: is there a way to exclude columns from a GROUP BY without using an aggregate function on them? Would I be better off creating temp tables for for each portion (overall averages, day averages, night averages, and bit fields for trace amounts) and then joining those together in the actual table insert? Or is there some neat trick I am completely missing altogether?
Thank you in advance for any fingers pointed in the right direction!
–
The create table scripts are as follows.
Original table containing raw data (foreign key on [Station] linking to an unrelated table:
CREATE TABLE [dbo].[raw_data](
[Station] [int] NOT NULL,
[Year] [char](4) NOT NULL,
[Month] [char](2) NOT NULL,
[Day] [char](2) NOT NULL,
[Time] [char](4) NOT NULL,
[Daylight] [bit] NULL,
[WindDirection] [smallint] NULL,
[WindSpeed] [smallint] NULL,
[WindGust] [smallint] NULL,
[CloudCeiling] [smallint] NULL,
[SkyCover] [char](3) NULL,
[LowCloudType] [smallint] NULL,
[MiddleCloudType] [smallint] NULL,
[HighCloudType] [smallint] NULL,
[Visibility] [decimal](5, 2) NULL,
[Temperature] [decimal](5, 2) NULL,
[DewPoint] [decimal](8, 2) NULL,
[SeaLevelPressure] [decimal](6, 2) NULL,
[Altimeter] [decimal](6, 2) NULL,
[StationPressure] [decimal](6, 2) NULL,
[MaxTemp] [smallint] NULL,
[MinTemp] [smallint] NULL,
[Precip1Hour] [decimal](6, 2) NULL,
[1HourTrace] [bit] NULL,
[Precip6Hour] [decimal](6, 2) NULL,
[6HourTrace] [bit] NULL,
[Precip24Hour] [decimal](6, 2) NULL,
[24HourTrace] [bit] NULL,
[PrecipOther] [decimal](6, 2) NULL,
[OtherTrace] [bit] NULL,
[SnowDepth] [smallint] NULL,
[SnowTrace] [bit] NULL
The destination table for this operation is as follows and has no key constraints currently, but will have a Foreign Key on [Station] and a composite Primary Key on Station, Year, Month, and Day once all data has been inserted.
create table station_daily(
Station int
,[Year] char(4)
,[Month] char(2)
,[Day] char(2)
,WindDirection decimal (6,2)
,WindSpeed decimal (6,2)
,WindGust decimal (6,2)
,CloudCeiling decimal (6,2)
,Visibility int
,Temperature decimal(5,2)
,DewPoint decimal(8,2)
,SeaLevelPressure decimal(6,2)
,Altimeter decimal(6,2)
,StationPressure decimal(6,2)
,MaxTemp decimal (6,2)
,MinTemp decimal (6,2)
,DayMaxTemp decimal (6,2)
,DayMinTemp decimal (6,2)
,DayAvgTemp decimal (6,2)
,NightMaxTemp decimal (6,2)
,NightMinTemp decimal (6,2)
,NightAvgTemp decimal (6,2)
,PrecipTotal decimal(6,2)
,PrecipTrace bit
,SnowDepth decimal (6,2)
,SnowTrace bit
,RecordCount int
These are my failed attempts at inserting the data into [station_daily].
The following script gave the error "Column 'weather_data.WindDirection' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
insert into station_daily
select [Station] as [Station]
,[Year] as [Year]
,[Month] as [Month]
,[Day] as [Day]
,avg(WindDirection) over (partition by [Station], [Year], [Month], [Day]) as WindDirection
,avg(WindSpeed) over (partition by [Station], [Year], [Month], [Day]) as WindSpeed
,avg(WindGust) over (partition by [Station], [Year], [Month], [Day]) as WindGust
,avg(CloudCeiling) over (partition by [Station], [Year], [Month], [Day]) as CloudCeiling
,avg(Visibility) over (partition by [Station], [Year], [Month], [Day]) as Visibility
,avg(Temperature) over (partition by [Station], [Year], [Month], [Day]) as Temperature
,avg(DewPoint) over (partition by [Station], [Year], [Month], [Day]) as DewPoint
,avg(SeaLevelPressure) over (partition by [Station], [Year], [Month], [Day]) as SeaLevelPressure
,avg(Altimeter) over (partition by [Station], [Year], [Month], [Day]) as Altimeter
,avg(StationPressure) over (partition by [Station], [Year], [Month], [Day]) as StationPressure
,avg(MaxTemp) over (partition by [Station], [Year], [Month], [Day]) as MaxTemp
,avg(MinTemp) over (partition by [Station], [Year], [Month], [Day]) as MinTemp
,(select max(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayMaxTemp
,(select min(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayMinTemp
,(select avg(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayAvgTemp
,(select max(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightMaxTemp
,(select min(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightMinTemp
,(select avg(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightAvgTemp
,(select sum(
Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther
)
from weather_data
group by [Station], [Year], [Month], [Day])
as TotalPrecip
,(case
when [1HourTrace] is not null then 1
when [6HourTrace] is not null then 1
when [24HourTrace] is not null then 1
when [OtherTrace] is not null then 1
else null
end
)as PrecipTrace
,avg(SnowDepth) over (partition by [Station], [Year], [Month], [Day]) as SnowDepth
,(case
when SnowDepth is not null then 1
else null
end
) as SnowTrace
,COUNT(*) as RecordCount
from weather_data
where [Station] < 20
group by [Station]
,[Year]
,[Month]
,[Day]
This next one wanted all the columns listed in GROUP BY since they weren't a part of an aggregate function. I may be incorrect, but wouldn't including those have skewed the overall averages I'm in search of for each day?
select a.[Station] as [Station]
,a.[Year] as [Year]
,a.[Month] as [Month]
,a.[Day] as [Day]
,b.WindDirection as WindDirection
,b.WindSpeed as WindSpeed
,b.WindGust as WindGust
,b.CloudCeiling as CloudCeiling
,b.Visibility as Visibility
,b.Temperature as Temperature
,b.DewPoint as DewPoint
,b.SeaLevelPressure as SeaLevelPressure
,b.Altimeter as Altimeter
,b.StationPressure as StationPressure
,b.MaxTemp as MaxTemp
,b.MinTemp as MinTemp
,b.DayMaxTemp as DayMaxTemp
,b.DayMinTemp as DayMinTemp
,b.DayAvgTemp as DayAvgTemp
,b.NightMaxTemp as NightMaxTemp
,b.NightMinTemp as NightMinTemp
,b.NightAvgTemp as NightAvgTemp
,b.TotalPrecip as TotalPrecip
,(select max(
case
when a.[1HourTrace] is not null or
a.[6HourTrace] is not null or
a.[24HourTrace] is not null or
a.[OtherTrace] is not null then 1
else null
end
)
)as PrecipTrace
,avg(b.SnowDepth )as SnowDepth
,(select max(
case
when a.SnowTrace is not null then 1
else null
end
)
)as SnowTrace
,avg(b.RecordCount)
from
weather_data a
join (
select [Station]
,[Year]
,[Month]
,[Day]
,avg(WindDirection) as WindDirection
,avg(WindSpeed) as WindSpeed
,avg(WindGust) as WindGust
,avg(CloudCeiling) as CloudCeiling
,avg(Visibility) as Visibility
,avg(Temperature) as Temperature
,avg(DewPoint) as DewPoint
,avg(SeaLevelPressure) as SeaLevelPressure
,avg(Altimeter) as Altimeter
,avg(StationPressure) as StationPressure
,avg(MaxTemp) as MaxTemp
,avg(MinTemp) as MinTemp
,(select max(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayMaxTemp
,(select min(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayMinTemp
,(select avg(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayAvgTemp
,(select max(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightMaxTemp
,(select min(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightMinTemp
,(select avg(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightAvgTemp
,(select sum(
Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther
)
from weather_data
group by [Station], [Year], [Month], [Day]
) as TotalPrecip
,avg(SnowDepth) as SnowDepth
,COUNT(*) as RecordCount
from weather_data
group by [Station]
,[Year]
,[Month]
,[Day]
) as b
on a.Station = b.Station
where a.Station = 5
group by a.[Station]
,a.[Year]
,a.[Month]
,a.[Day]
The next gave an error on "line 1" stating that the subquery returned multiple values.
select a.[Station] as [Station]
,a.[Year] as [Year]
,a.[Month] as [Month]
,a.[Day] as [Day]
,avg(b.WindDirection )as WindDirection
,avg(b.WindSpeed )as WindSpeed
,avg(b.WindGust )as WindGust
,avg(b.CloudCeiling )as CloudCeiling
,avg(b.Visibility )as Visibility
,avg(b.Temperature )as Temperature
,avg(b.DewPoint )as DewPoint
,avg(b.SeaLevelPressure )as SeaLevelPressure
,avg(b.Altimeter )as Altimeter
,avg(b.StationPressure )as StationPressure
,avg(b.MaxTemp )as MaxTemp
,avg(b.MinTemp )as MinTemp
,avg(b.DayMaxTemp )as DayMaxTemp
,avg(b.DayMinTemp )as DayMinTemp
,avg(b.DayAvgTemp )as DayAvgTemp
,avg(b.NightMaxTemp )as NightMaxTemp
,avg(b.NightMinTemp )as NightMinTemp
,avg(b.NightAvgTemp )as NightAvgTemp
,avg(b.TotalPrecip )as TotalPrecip
,(select max(
case
when a.[1HourTrace] is not null or
a.[6HourTrace] is not null or
a.[24HourTrace] is not null or
a.[OtherTrace] is not null then 1
else null
end
)
)as PrecipTrace
,avg(b.SnowDepth )as SnowDepth
,(select max(
case
when a.SnowTrace is not null then 1
else null
end
)
)as SnowTrace
,avg(b.RecordCount)
from
weather_data a
join (
select [Station]
,[Year]
,[Month]
,[Day]
,avg(WindDirection) as WindDirection
,avg(WindSpeed) as WindSpeed
,avg(WindGust) as WindGust
,avg(CloudCeiling) as CloudCeiling
,avg(Visibility) as Visibility
,avg(Temperature) as Temperature
,avg(DewPoint) as DewPoint
,avg(SeaLevelPressure) as SeaLevelPressure
,avg(Altimeter) as Altimeter
,avg(StationPressure) as StationPressure
,avg(MaxTemp) as MaxTemp
,avg(MinTemp) as MinTemp
,(select max(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayMaxTemp
,(select min(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayMinTemp
,(select avg(
Temperature)
from weather_data
where [Daylight] = 1
group by [Station], [Year], [Month], [Day]
) as DayAvgTemp
,(select max(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightMaxTemp
,(select min(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightMinTemp
,(select avg(
Temperature)
from weather_data
where [Daylight] = 0
group by [Station], [Year], [Month], [Day]
) as NightAvgTemp
,(select sum(
Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther
)
from weather_data
group by [Station], [Year], [Month], [Day]
) as TotalPrecip
,avg(SnowDepth) as SnowDepth
,COUNT(*) as RecordCount
from weather_data
group by [Station]
,[Year]
,[Month]
,[Day]
) as b
on a.Station = b.Station
where a.Station = 5
group by a.[Station]
,a.[Year]
,a.[Month]
,a.[Day]
In this last one I attempted, following the advice in the post I linked above, I attempted to use all aggregate functions in the joins, yet I still receive "Column 'bAVG.WindDirection' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."
select a.[Station] as [Station]
,a.[Year] as [Year]
,a.[Month] as [Month]
,a.[Day] as [Day]
,bAVG.WindDirection as WindDirection
,bAVG.WindSpeed as WindSpeed
,bAVG.WindGust as WindGust
,bAVG.CloudCeiling as CloudCeiling
,bAVG.Visibility as Visibility
,bAVG.Temperature as Temperature
,bAVG.DewPoint as DewPoint
,bAVG.SeaLevelPressure as SeaLevelPressure
,bAVG.Altimeter as Altimeter
,bAVG.StationPressure as StationPressure
,bAVG.MaxTemp as MaxTemp
,bAVG.MinTemp as MinTemp
,bAVG.TotalPrecip as TotalPrecip
,bAVG.SnowDepth as SnowDepth
,cDAY.DayMaxTemp as DayMaxTemp
,cDAY.DayMinTemp as DayMinTemp
,cDAY.DayAvgTemp as DayAvgTemp
,dNIGHT.NightMaxTemp as NightMaxTemp
,dNIGHT.NightMinTemp as NightMinTemp
,dNIGHT.NightAvgTemp as NightAvgTemp
,(select max(TracePrecip)
from (values (eTRACE.[1HourTrace])
,(eTRACE.[6HourTrace])
,(eTRACE.[24HourTrace])
,(eTRACE.[OtherTrace])
) as Precip(TracePrecip)
) as PrecipTrace
,max(eTRACE.SnowTrace) as SnowTrace
from weather_data a
join (
select [Station]
,[Year]
,[Month]
,[Day]
,avg(WindDirection) as WindDirection
,avg(WindSpeed) as WindSpeed
,avg(WindGust) as WindGust
,avg(CloudCeiling) as CloudCeiling
,avg(Visibility) as Visibility
,avg(Temperature) as Temperature
,avg(DewPoint) as DewPoint
,avg(SeaLevelPressure) as SeaLevelPressure
,avg(Altimeter) as Altimeter
,avg(StationPressure) as StationPressure
,avg(MaxTemp) as MaxTemp
,avg(MinTemp) as MinTemp
,sum(Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther) as TotalPrecip
,avg(SnowDepth) as SnowDepth
from weather_data
group by [Station]
,[Year]
,[Month]
,[Day]
) bAVG
on a.Station = bAVG.Station and
a.[Year] = bAVG.[Year] and
a.[Month] = bAVG.[Month] and
a.[Day] = bAVG.[Day]
join (
select [Station]
,[Year]
,[Month]
,[Day]
,max(Temperature) as DayMaxTemp
,min(Temperature) as DayMinTemp
,avg(Temperature) as DayAvgTemp
from weather_data
where [Daylight] = 1
group by [Station]
,[Year]
,[Month]
,[Day]
) cDAY
on a.Station = cDAY.Station and
a.[Year] = cDAY.[Year] and
a.[Month] = cDAY.[Month] and
a.[Day] = cDAY.[Day]
join (
select [Station]
,[Year]
,[Month]
,[Day]
,max(Temperature) as NightMaxTemp
,min(Temperature) as NightMinTemp
,avg(Temperature) as NightAvgTemp
from weather_data
where [Daylight] = 0
group by [Station]
,[Year]
,[Month]
,[Day]
) dNIGHT
on a.Station = dNIGHT.Station and
a.[Year] = dNIGHT.[Year] and
a.[Month] = dNIGHT.[Month] and
a.[Day] = dNIGHT.[Day]
join (
select [Station]
,[Year]
,[Month]
,[Day]
,max(cast([1HourTrace] as int)) as [1HourTrace]
,max(cast([6HourTrace] as int)) as [6HourTrace]
,max(cast([24HourTrace] as int)) as [24HourTrace]
,max(cast([OtherTrace] as int)) as [OtherTrace]
,max(cast([SnowTrace] as int)) as [SnowTrace]
from weather_data
group by [Station]
,[Year]
,[Month]
,[Day]
) eTRACE
on a.Station = eTRACE.Station and
a.[Year] = eTRACE.[Year] and
a.[Month] = eTRACE.[Month] and
a.[Day] = eTRACE.[Day]
group by a.[Station]
,a.[Year]
,a.[Month]
,a.[Day]
Even if I exclude the last 4 lines in that final GROUP BY, I receive the error that the first column (Station) is invalid since it's not contained in either an aggregate function or the GROUP BY clause.
Best Answer
Looks like you're making this harder than it needs to be. What about something like this: