Sql-server – group by excluding a column

aggregatectesql serversql-server-2012t-sql

I have this code:

insert into [dbo].[NGC_Agent_Intervals]
SELECT  
    AgentID, 
    date, 
    [Hour] = DATEPART(HOUR, RSRange),
    hour_quarter = CONCAT(DATEPART(MINUTE, RSRange) ,'-', DATEPART(MINUTE, RERange)),
    DepartmentID,
    [DepartmentName], 
    [AgentState], 
    [AgentStateReason], 
    ( select AgentStateReasonDescription from RealRanges) as AgentStateReasonDescription,
    [ExtensionID],
    [WorkstationID],

    Duration = sum(CASE 
      WHEN RSRange <= aStart and aend >= RERange THEN DATEDIFF(SECOND, aStart, DATEADD(MINUTE, 15, RSRange)) 
      when  RSRange <= aStart and aend < RERange THEN DATEDIFF(SECOND, aStart, aEnd) 
      WHEN RERange >= aEnd THEN DATEDIFF(SECOND, RSRange, aEnd) 
      ELSE DATEDIFF(SECOND, RSRange, RERange)
    END)
 FROM RealRanges

 WHERE DATEDIFF(SECOND, RSRange, aEnd) > 0
   AND AgentState NOT IN ('logout', 'LOGIN')


 GROUP BY duration,RealRanges.AgentID, RealRanges.date, RealRanges.hour,
  DepartmentID,[DepartmentName], [AgentState], [AgentStateReason],
  [ExtensionID],[WorkstationID], RSRange, RealRanges.RERange

I run this code after using a cte to split the original rows from a table to a few rows by hour quarters.
What I want to do is to sum the duration value, if a few columns are matching in a few rows.

That's the original table values and I want to sum the duration:

if i dont sum the duration it looks like this:

It should look like this after the sum:

The only problem is that I want to group by only for the following columns:

AgentID Date hour Hour_Quarter AgentState AgentStatereason

and exclude the other columns so even if the extensionid is different (for example) I still get the sum from 2 columns.
I tried doing that with a sub-query (in the code above) but I get an error:

So what am I missing?

Best Answer

To avoid grouping by a specific column that returns multiple values, you can either remove it from the query, or you can explicitly tell it which value you want. You can do this using aggregate or analytic functions, like:

For numeric values you can also present new values, such as:

SUM(Duration)
AVG(Duration)

But basically, if you want to show a single row collapsed from two different ExtensionID values, you need to define how to do that. In this case, is ExtensionID important? If it is, do you want to show the most recent one? The first one? The one that occurs most frequently? What about ties? There are solutions to all of these, but you have to know what you want your query to return.

You can't just leave things out of the GROUP BY like you can with MySQL, where you get - let's say - weird results.

Related Solutions

Sql-server – T-SQL SELECT use multiple indexes for no reason

In your first query the index is used to find rows matching [InverterData].[Date] = '05.01.2016' then it needs to lookup the rest of the row data to satisfy being able to return ACPower and DCPower - if you remove these columns from the output you'll see the extra lookup go away.

You could include the extra columns in the index with:

CREATE NONCLUSTERED INDEX [NonClusteredIndex] 
ON [InverterData] ([Date] DESC, [InverterID] ASC) INCLUDE ([ACPower], [DCPower])

This removes the extra lookup via the clustered index at the expense of making the non-clustered index consume more space on disk (and in memory). This query speed and used space trade-off is something you'll have to decide upon by running benchmarks on the bits of the application that use that table.

Note that you could also do:

CREATE NONCLUSTERED INDEX [NonClusteredIndex] 
ON [InverterData] ([Date] DESC, [InverterID] ASC, [ACPower], [DCPower])

which uses the extra columns as part of the key rather than just INCLUDING them. The former example is likely to be more efficient as the values are unlikely to be filtered/sorted upon so it saves page splits caused by the engine trying to keep the (effectively random) values stored in order.

Sql-server – T-SQL to Group time interval change by date range in sql server

there is an example , from Itzik Ben Gan (Gaps and Islands in Sequences) This is base on this article: Gaps

DECLARE @vt_Source AS TABLE
( ts datetime NOT NULL PRIMARY KEY,
 interval tinyint NULL
)

INSERT INTO @vt_Source(ts, interval)
VALUES('2016-12-31 00:28:00',     NULL)
,('2016-12-31 00:29:00'  ,   1)
,('2016-12-31 00:30:00'  ,   1)
,('2016-12-31 00:45:00'  ,   15)
,('2016-12-31 01:00:00'  ,   15)
,('2016-12-31 01:15:00'  ,   15)
,('2016-12-31 01:16:00'  ,   1)
,('2016-12-31 01:17:00'  ,   1)
,('2016-12-31 01:18:00'  ,   1)
,('2016-12-31 01:19:00'  ,   1)


SELECT
   min(ts_prev) AS startDate
  ,max(ts) AS endDate
  ,interval
FROM
   (SELECT
         ts
         ,interval
         ,ROW_NUMBER() OVER(ORDER BY ts ASC) AS  rn_all
         ,ROW_NUMBER() OVER(PARTITION BY interval ORDER BY ts ASC) AS rn_group
         ,LAG(ts,1,ts) OVER(ORDER BY ts ASC) AS ts_prev
    FROM
        @vt_Source
   )A
WHERE
   A.interval IS NOT NULL
GROUP BY
   rn_all - rn_group
   ,interval
ORDER BY 
   startDate ASC

output for this:

startDate           endDate          interval
31/12/2016 00:28:00 31/12/2016 00:30:00 1
31/12/2016 00:30:00 31/12/2016 01:15:00 15
31/12/2016 01:15:00 31/12/2016 01:19:00 1

I added a WHERE clause to eliminate the first row , the one that have NULL on interval column

http://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=fe1888d0a934d73de0ed9887aaf4d482

Best Answer

Related Solutions

Sql-server – T-SQL SELECT use multiple indexes for no reason

Sql-server – T-SQL to Group time interval change by date range in sql server

Related Question