Sql-server – Non-aggregated columns in group by

sql-server-2012t-sql

I have a table which contains raw data for various weather conditions. Each day has multiple data readings, some in 5 minute increments all the way up to hour long increments. I am looking to create a table that holds various averages as well as a bit field for certain conditions.

I found one post that I felt was close to my scenario (Confusing Query: Mixing Aggregates and Non-Aggregates) and both solutions gave various errors.

I suppose my question is: is there a way to exclude columns from a GROUP BY without using an aggregate function on them? Would I be better off creating temp tables for for each portion (overall averages, day averages, night averages, and bit fields for trace amounts) and then joining those together in the actual table insert? Or is there some neat trick I am completely missing altogether?

Thank you in advance for any fingers pointed in the right direction!

The create table scripts are as follows.

Original table containing raw data (foreign key on [Station] linking to an unrelated table:

CREATE TABLE [dbo].[raw_data](
[Station] [int] NOT NULL,
[Year] [char](4) NOT NULL,
[Month] [char](2) NOT NULL,
[Day] [char](2) NOT NULL,
[Time] [char](4) NOT NULL,
[Daylight] [bit] NULL,
[WindDirection] [smallint] NULL,
[WindSpeed] [smallint] NULL,
[WindGust] [smallint] NULL,
[CloudCeiling] [smallint] NULL,
[SkyCover] [char](3) NULL,
[LowCloudType] [smallint] NULL,
[MiddleCloudType] [smallint] NULL,
[HighCloudType] [smallint] NULL,
[Visibility] [decimal](5, 2) NULL,
[Temperature] [decimal](5, 2) NULL,
[DewPoint] [decimal](8, 2) NULL,
[SeaLevelPressure] [decimal](6, 2) NULL,
[Altimeter] [decimal](6, 2) NULL,
[StationPressure] [decimal](6, 2) NULL,
[MaxTemp] [smallint] NULL,
[MinTemp] [smallint] NULL,
[Precip1Hour] [decimal](6, 2) NULL,
[1HourTrace] [bit] NULL,
[Precip6Hour] [decimal](6, 2) NULL,
[6HourTrace] [bit] NULL,
[Precip24Hour] [decimal](6, 2) NULL,
[24HourTrace] [bit] NULL,
[PrecipOther] [decimal](6, 2) NULL,
[OtherTrace] [bit] NULL,
[SnowDepth] [smallint] NULL,
[SnowTrace] [bit] NULL

The destination table for this operation is as follows and has no key constraints currently, but will have a Foreign Key on [Station] and a composite Primary Key on Station, Year, Month, and Day once all data has been inserted.

create table station_daily(
Station int
,[Year] char(4)
,[Month] char(2)
,[Day] char(2)
,WindDirection decimal (6,2)
,WindSpeed decimal (6,2)
,WindGust decimal (6,2)
,CloudCeiling decimal (6,2)
,Visibility int
,Temperature decimal(5,2)
,DewPoint decimal(8,2)
,SeaLevelPressure decimal(6,2)
,Altimeter decimal(6,2)
,StationPressure decimal(6,2)
,MaxTemp decimal (6,2)
,MinTemp decimal (6,2)
,DayMaxTemp decimal (6,2)
,DayMinTemp decimal (6,2)
,DayAvgTemp decimal (6,2)
,NightMaxTemp decimal (6,2)
,NightMinTemp decimal (6,2)
,NightAvgTemp decimal (6,2)
,PrecipTotal decimal(6,2)
,PrecipTrace bit
,SnowDepth decimal (6,2)
,SnowTrace bit
,RecordCount int

These are my failed attempts at inserting the data into [station_daily].

The following script gave the error "Column 'weather_data.WindDirection' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."

insert into station_daily
select [Station] as [Station]
    ,[Year] as [Year]
    ,[Month] as [Month]
    ,[Day] as [Day]
    ,avg(WindDirection) over (partition by [Station], [Year], [Month], [Day]) as WindDirection
    ,avg(WindSpeed) over (partition by [Station], [Year], [Month], [Day]) as WindSpeed
    ,avg(WindGust) over (partition by [Station], [Year], [Month], [Day]) as WindGust
    ,avg(CloudCeiling) over (partition by [Station], [Year], [Month], [Day]) as CloudCeiling
    ,avg(Visibility) over (partition by [Station], [Year], [Month], [Day]) as Visibility
    ,avg(Temperature) over (partition by [Station], [Year], [Month], [Day]) as Temperature
    ,avg(DewPoint) over (partition by [Station], [Year], [Month], [Day]) as DewPoint
    ,avg(SeaLevelPressure) over (partition by [Station], [Year], [Month], [Day]) as SeaLevelPressure
    ,avg(Altimeter) over (partition by [Station], [Year], [Month], [Day]) as Altimeter
    ,avg(StationPressure) over (partition by [Station], [Year], [Month], [Day]) as StationPressure
    ,avg(MaxTemp) over (partition by [Station], [Year], [Month], [Day]) as MaxTemp
    ,avg(MinTemp) over (partition by [Station], [Year], [Month], [Day]) as MinTemp
    ,(select max(
                Temperature)
                from weather_data
                where [Daylight] = 1
                group by [Station], [Year], [Month], [Day]
                ) as DayMaxTemp
    ,(select min(
                Temperature)
                from weather_data
                where [Daylight] = 1
                group by [Station], [Year], [Month], [Day]
                ) as DayMinTemp
    ,(select avg(
                Temperature)
                from weather_data
                where [Daylight] = 1
                group by [Station], [Year], [Month], [Day]
                ) as DayAvgTemp
    ,(select max(
                Temperature)
                from weather_data
                where [Daylight] = 0
                group by [Station], [Year], [Month], [Day]
                ) as NightMaxTemp
    ,(select min(
                Temperature)
                from weather_data
                where [Daylight] = 0
                group by [Station], [Year], [Month], [Day]
                ) as NightMinTemp
    ,(select avg(
                Temperature)
                from weather_data
                where [Daylight] = 0
                group by [Station], [Year], [Month], [Day]
                ) as NightAvgTemp
    ,(select sum(
                Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther
                )
                from weather_data 
                group by [Station], [Year], [Month], [Day])
                as TotalPrecip
    ,(case
        when [1HourTrace] is not null then 1
        when [6HourTrace] is not null then 1
        when [24HourTrace] is not null then 1
        when [OtherTrace] is not null then 1
        else null
        end
        )as PrecipTrace
    ,avg(SnowDepth)  over (partition by [Station], [Year], [Month], [Day]) as SnowDepth
    ,(case
        when SnowDepth is not null then 1
        else null
        end
        ) as SnowTrace
    ,COUNT(*) as RecordCount
    from weather_data
    where [Station] < 20
    group by [Station]
                ,[Year]
                ,[Month]
                ,[Day]

This next one wanted all the columns listed in GROUP BY since they weren't a part of an aggregate function. I may be incorrect, but wouldn't including those have skewed the overall averages I'm in search of for each day?

select a.[Station] as [Station]
    ,a.[Year] as [Year]
    ,a.[Month] as [Month]
    ,a.[Day] as [Day]
    ,b.WindDirection        as          WindDirection
    ,b.WindSpeed            as          WindSpeed
    ,b.WindGust             as          WindGust
    ,b.CloudCeiling         as          CloudCeiling
    ,b.Visibility           as          Visibility
    ,b.Temperature          as          Temperature
    ,b.DewPoint             as          DewPoint
    ,b.SeaLevelPressure     as          SeaLevelPressure
    ,b.Altimeter            as          Altimeter
    ,b.StationPressure      as          StationPressure
    ,b.MaxTemp              as          MaxTemp
    ,b.MinTemp              as          MinTemp
    ,b.DayMaxTemp           as          DayMaxTemp
    ,b.DayMinTemp           as          DayMinTemp
    ,b.DayAvgTemp           as          DayAvgTemp
    ,b.NightMaxTemp         as          NightMaxTemp
    ,b.NightMinTemp         as          NightMinTemp
    ,b.NightAvgTemp         as          NightAvgTemp
    ,b.TotalPrecip          as          TotalPrecip
    ,(select max(
                case
                    when a.[1HourTrace] is not null or
                        a.[6HourTrace] is not null or
                        a.[24HourTrace] is not null or
                        a.[OtherTrace] is not null then 1
                    else null
                    end
                )
        )as PrecipTrace
    ,avg(b.SnowDepth            )as         SnowDepth
    ,(select max(
                case
                    when a.SnowTrace is not null then 1
                    else null
                    end
                ) 
        )as SnowTrace
    ,avg(b.RecordCount)

from
    weather_data a
join (
    select [Station]
            ,[Year]
            ,[Month]
            ,[Day]
            ,avg(WindDirection)             as          WindDirection
            ,avg(WindSpeed)                 as          WindSpeed
            ,avg(WindGust)                  as          WindGust
            ,avg(CloudCeiling)              as          CloudCeiling
            ,avg(Visibility)                as          Visibility
            ,avg(Temperature)               as          Temperature
            ,avg(DewPoint)                  as          DewPoint
            ,avg(SeaLevelPressure)          as          SeaLevelPressure
            ,avg(Altimeter)                 as          Altimeter
            ,avg(StationPressure)           as          StationPressure
            ,avg(MaxTemp)                   as          MaxTemp
            ,avg(MinTemp)                   as          MinTemp
            ,(select max(
                        Temperature)
                        from weather_data
                        where [Daylight] = 1
                        group by [Station], [Year], [Month], [Day]
                        )                   as          DayMaxTemp
            ,(select min(
                        Temperature)
                        from weather_data
                        where [Daylight] = 1
                        group by [Station], [Year], [Month], [Day]
                        )                   as          DayMinTemp
            ,(select avg(
                        Temperature)
                        from weather_data
                        where [Daylight] = 1
                        group by [Station], [Year], [Month], [Day]
                        )                   as          DayAvgTemp
            ,(select max(
                        Temperature)
                        from weather_data
                        where [Daylight] = 0
                        group by [Station], [Year], [Month], [Day]
                        )                   as          NightMaxTemp
            ,(select min(
                        Temperature)
                        from weather_data
                        where [Daylight] = 0
                        group by [Station], [Year], [Month], [Day]
                        )                   as          NightMinTemp
            ,(select avg(
                        Temperature)
                        from weather_data
                        where [Daylight] = 0
                        group by [Station], [Year], [Month], [Day]
                        )                   as          NightAvgTemp
            ,(select sum(
                        Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther
                        )
                        from weather_data 
                        group by [Station], [Year], [Month], [Day]
                        )                   as          TotalPrecip
            ,avg(SnowDepth)                 as          SnowDepth
            ,COUNT(*)                       as          RecordCount
    from weather_data
    group by [Station]
                ,[Year]
                ,[Month]
                ,[Day]
    ) as b
    on a.Station = b.Station
    where a.Station = 5
    group by a.[Station]
                ,a.[Year]
                ,a.[Month]
                ,a.[Day]

The next gave an error on "line 1" stating that the subquery returned multiple values.

select a.[Station] as [Station]
    ,a.[Year] as [Year]
    ,a.[Month] as [Month]
    ,a.[Day] as [Day]
    ,avg(b.WindDirection        )as         WindDirection
    ,avg(b.WindSpeed            )as         WindSpeed
    ,avg(b.WindGust             )as         WindGust
    ,avg(b.CloudCeiling         )as         CloudCeiling
    ,avg(b.Visibility           )as         Visibility
    ,avg(b.Temperature          )as         Temperature
    ,avg(b.DewPoint             )as         DewPoint
    ,avg(b.SeaLevelPressure     )as         SeaLevelPressure
    ,avg(b.Altimeter            )as         Altimeter
    ,avg(b.StationPressure      )as         StationPressure
    ,avg(b.MaxTemp              )as         MaxTemp
    ,avg(b.MinTemp              )as         MinTemp
    ,avg(b.DayMaxTemp           )as         DayMaxTemp
    ,avg(b.DayMinTemp           )as         DayMinTemp
    ,avg(b.DayAvgTemp           )as         DayAvgTemp
    ,avg(b.NightMaxTemp         )as         NightMaxTemp
    ,avg(b.NightMinTemp         )as         NightMinTemp
    ,avg(b.NightAvgTemp         )as         NightAvgTemp
    ,avg(b.TotalPrecip          )as         TotalPrecip
    ,(select max(
                case
                    when a.[1HourTrace] is not null or
                        a.[6HourTrace] is not null or
                        a.[24HourTrace] is not null or
                        a.[OtherTrace] is not null then 1
                    else null
                    end
                )
        )as PrecipTrace
    ,avg(b.SnowDepth            )as         SnowDepth
    ,(select max(
                case
                    when a.SnowTrace is not null then 1
                    else null
                    end
                ) 
        )as SnowTrace
    ,avg(b.RecordCount)

from
    weather_data a
join (
    select [Station]
            ,[Year]
            ,[Month]
            ,[Day]
            ,avg(WindDirection)             as          WindDirection
            ,avg(WindSpeed)                 as          WindSpeed
            ,avg(WindGust)                  as          WindGust
            ,avg(CloudCeiling)              as          CloudCeiling
            ,avg(Visibility)                as          Visibility
            ,avg(Temperature)               as          Temperature
            ,avg(DewPoint)                  as          DewPoint
            ,avg(SeaLevelPressure)          as          SeaLevelPressure
            ,avg(Altimeter)                 as          Altimeter
            ,avg(StationPressure)           as          StationPressure
            ,avg(MaxTemp)                   as          MaxTemp
            ,avg(MinTemp)                   as          MinTemp
            ,(select max(
                        Temperature)
                        from weather_data
                        where [Daylight] = 1
                        group by [Station], [Year], [Month], [Day]
                        )                   as          DayMaxTemp
            ,(select min(
                        Temperature)
                        from weather_data
                        where [Daylight] = 1
                        group by [Station], [Year], [Month], [Day]
                        )                   as          DayMinTemp
            ,(select avg(
                        Temperature)
                        from weather_data
                        where [Daylight] = 1
                        group by [Station], [Year], [Month], [Day]
                        )                   as          DayAvgTemp
            ,(select max(
                        Temperature)
                        from weather_data
                        where [Daylight] = 0
                        group by [Station], [Year], [Month], [Day]
                        )                   as          NightMaxTemp
            ,(select min(
                        Temperature)
                        from weather_data
                        where [Daylight] = 0
                        group by [Station], [Year], [Month], [Day]
                        )                   as          NightMinTemp
            ,(select avg(
                        Temperature)
                        from weather_data
                        where [Daylight] = 0
                        group by [Station], [Year], [Month], [Day]
                        )                   as          NightAvgTemp
            ,(select sum(
                        Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther
                        )
                        from weather_data 
                        group by [Station], [Year], [Month], [Day]
                        )                   as          TotalPrecip
            ,avg(SnowDepth)                 as          SnowDepth
            ,COUNT(*)                       as          RecordCount
    from weather_data
    group by [Station]
                ,[Year]
                ,[Month]
                ,[Day]
    ) as b
    on a.Station = b.Station
    where a.Station = 5
    group by a.[Station]
                ,a.[Year]
                ,a.[Month]
                ,a.[Day]

In this last one I attempted, following the advice in the post I linked above, I attempted to use all aggregate functions in the joins, yet I still receive "Column 'bAVG.WindDirection' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause."

select a.[Station]                  as          [Station]
    ,a.[Year]                   as          [Year]
    ,a.[Month]                  as          [Month]
    ,a.[Day]                    as          [Day]
    ,bAVG.WindDirection         as          WindDirection
    ,bAVG.WindSpeed             as          WindSpeed
    ,bAVG.WindGust              as          WindGust
    ,bAVG.CloudCeiling          as          CloudCeiling
    ,bAVG.Visibility            as          Visibility
    ,bAVG.Temperature           as          Temperature
    ,bAVG.DewPoint              as          DewPoint
    ,bAVG.SeaLevelPressure      as          SeaLevelPressure
    ,bAVG.Altimeter             as          Altimeter
    ,bAVG.StationPressure       as          StationPressure
    ,bAVG.MaxTemp               as          MaxTemp
    ,bAVG.MinTemp               as          MinTemp
    ,bAVG.TotalPrecip           as          TotalPrecip
    ,bAVG.SnowDepth             as          SnowDepth
    ,cDAY.DayMaxTemp            as          DayMaxTemp
    ,cDAY.DayMinTemp            as          DayMinTemp
    ,cDAY.DayAvgTemp            as          DayAvgTemp
    ,dNIGHT.NightMaxTemp        as          NightMaxTemp
    ,dNIGHT.NightMinTemp        as          NightMinTemp
    ,dNIGHT.NightAvgTemp        as          NightAvgTemp
    ,(select max(TracePrecip)
        from (values (eTRACE.[1HourTrace])
                        ,(eTRACE.[6HourTrace])
                        ,(eTRACE.[24HourTrace])
                        ,(eTRACE.[OtherTrace])
                        ) as Precip(TracePrecip)
    ) as PrecipTrace
    ,max(eTRACE.SnowTrace)      as          SnowTrace
from weather_data a                     
join (
        select [Station]
                  ,[Year]
                  ,[Month]
                  ,[Day]
                  ,avg(WindDirection)                                                   as          WindDirection
                  ,avg(WindSpeed)                                                       as          WindSpeed
                  ,avg(WindGust)                                                    as          WindGust
                  ,avg(CloudCeiling)                                                as          CloudCeiling
                  ,avg(Visibility)                                                  as          Visibility
                  ,avg(Temperature)                                                 as          Temperature
                  ,avg(DewPoint)                                                    as          DewPoint
                  ,avg(SeaLevelPressure)                                            as          SeaLevelPressure
                  ,avg(Altimeter)                                                   as          Altimeter
                  ,avg(StationPressure)                                             as          StationPressure
                  ,avg(MaxTemp)                                                         as          MaxTemp
                  ,avg(MinTemp)                                                         as          MinTemp
               ,sum(Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther)         as          TotalPrecip
               ,avg(SnowDepth)                                                      as          SnowDepth
            from weather_data
            group by [Station]
                        ,[Year]
                        ,[Month]
                        ,[Day]
        ) bAVG
on a.Station                             =                  bAVG.Station and
    a.[Year]                             =                  bAVG.[Year] and
    a.[Month]                            =                  bAVG.[Month] and
    a.[Day]                              =                  bAVG.[Day]
join (  
        select [Station]    
                ,[Year] 
                ,[Month]    
                ,[Day]  
                ,max(Temperature)                                                   as          DayMaxTemp
                ,min(Temperature)                                                   as          DayMinTemp
                ,avg(Temperature)                                                   as          DayAvgTemp
            from weather_data       
            where [Daylight] = 1    
            group by [Station]  
                        ,[Year] 
                        ,[Month]    
                        ,[Day]
        ) cDAY  
on a.Station                             =                  cDAY.Station and
    a.[Year]                             =                  cDAY.[Year] and
    a.[Month]                            =                  cDAY.[Month] and
    a.[Day]                              =                  cDAY.[Day]
join (  
        select [Station]    
                ,[Year] 
                ,[Month]    
                ,[Day]  
                ,max(Temperature)                                                   as          NightMaxTemp
                ,min(Temperature)                                                   as          NightMinTemp
                ,avg(Temperature)                                                   as          NightAvgTemp
            from weather_data   
            where [Daylight] = 0
            group by [Station]  
                        ,[Year] 
                        ,[Month]    
                        ,[Day]      
        ) dNIGHT    
on a.Station                            =                   dNIGHT.Station and
    a.[Year]                            =                   dNIGHT.[Year] and
    a.[Month]                           =                   dNIGHT.[Month] and
    a.[Day]                             =                   dNIGHT.[Day]
join (  
        select [Station]
                ,[Year]
                ,[Month]
                ,[Day]
                ,max(cast([1HourTrace] as int))                             as          [1HourTrace]    
                   ,max(cast([6HourTrace] as int))                              as          [6HourTrace] 
                   ,max(cast([24HourTrace] as int))                         as          [24HourTrace]
                   ,max(cast([OtherTrace] as int))                              as          [OtherTrace]
                ,max(cast([SnowTrace] as int))                              as          [SnowTrace]
            from weather_data   
            group by [Station]  
                        ,[Year] 
                        ,[Month]    
                        ,[Day]
        ) eTRACE
on a.Station                            =                   eTRACE.Station and
    a.[Year]                            =                   eTRACE.[Year] and
    a.[Month]                           =                   eTRACE.[Month] and
    a.[Day]                             =                   eTRACE.[Day]
group by a.[Station]
            ,a.[Year]
            ,a.[Month]
            ,a.[Day]

Even if I exclude the last 4 lines in that final GROUP BY, I receive the error that the first column (Station) is invalid since it's not contained in either an aggregate function or the GROUP BY clause.

Best Answer

Looks like you're making this harder than it needs to be. What about something like this:

select 
     [Station] as [Station]
    ,[Year] as [Year]
    ,[Month] as [Month]
    ,[Day] as [Day]
    ,avg(WindDirection) as WindDirection
    ,avg(WindSpeed) as WindSpeed
    ,avg(WindGust) as WindGust
    ,avg(CloudCeiling) as CloudCeiling
    ,avg(Visibility) as Visibility
    ,avg(Temperature) as Temperature
    ,avg(DewPoint) as DewPoint
    ,avg(SeaLevelPressure) as SeaLevelPressure
    ,avg(Altimeter) as Altimeter
    ,avg(StationPressure) as StationPressure
    ,avg(MaxTemp) as MaxTemp
    ,avg(MinTemp) as MinTemp
    ,max(case when [Daylight] = 1 then Temperature else null end) as DayMaxTemp
    ,min(case when [Daylight] = 1 then Temperature else null end) as DayMinTemp
    ,avg(case when [Daylight] = 1 then Temperature else null end) as DayAvgTemp
    ,max(case when [Daylight] = 0 then Temperature else null end) as NightMaxTemp
    ,min(case when [Daylight] = 0 then Temperature else null end) as NightMinTemp
    ,avg(case when [Daylight] = 0 then Temperature else null end) as NightAvgTemp
    ,sum(Precip1Hour + Precip6Hour + Precip24Hour + PrecipOther) as TotalPrecip
    ,max(case
        when [1HourTrace] is not null then 1
        when [6HourTrace] is not null then 1
        when [24HourTrace] is not null then 1
        when [OtherTrace] is not null then 1
        else 0
        end
        ) as PrecipTrace
    ,avg(SnowDepth) as SnowDepth
    ,max(case when SnowDepth is not null then 1 else null end) as SnowTrace
    ,COUNT(*) as RecordCount
    from weather_data
    where [Station] < 20
    group by [Station], [Year] ,[Month] ,[Day];