Sql-server – SQL query – How to select to display group, group total, grand total

aggregategroup bysql server

I've seen some advice but want to know the best way to select a group, group total, grand total from a data table WITHOUT using a subquery or unnecessary join.

My initial thought was something like this:

select   product_family, 
         sum(widgets), 
         sum(widgets) over ()
from     table.widget
group by product_family

or the following:

select   product_family, 
         sum(widgets), 
         sum(widgets) over (partition by all_field)
from     table.widget
group by product_family

Obviously neither of these works. I know a partition can be of a higher order than the actual rows/ groups, but not how to partition by "all" other than leaving it blank as the first example. However, it breaks with the group by statement.

Best I can find is something like this:

select product_family, 
       family_sum, 
       sum(family_sum) over () as grand_sum 
from (
      select   product_family, sum(widgets)
      from     table.widget
      group by product_family
     ) as A

That still involves a subquery though, which is okay. I just feel like I'm missing an easy function here.

Best Answer

GROUP BY ROLLUP or - even better - GROUP BY GROUPING SETS is best if you want an additional row with the grand total.

If you want the grand total in every row - in another column - then your query needs a minor adjustment. You can use the aggregate SUM(widgets) in a window function:

select   product_family, 
         sum(widgets)              as total, 
         sum(sum(widgets)) over () as grand_total
from     table.widget
group by product_family ;

Related Solutions

Sql-server – How to write windowing query which sums a column to create discrete buckets

I am not sure what type of performance you are looking for, but if CLR or external app is not an option, a cursor is all that is left. On my aged laptop I get through 1,000,000 rows in about 100 seconds using the following solution. The nice thing about it is that it scales linearly, so I would be looking at a little about 20 minutes to run through the entire thing. With a decent server you will be faster, but not an order of magnitude, so it would still take several minutes to complete this. If this is a one off process, you probably can afford the slowness. If you need to run this as a report or similar regularly, you might want to store the values in the same table un update them as new rows get added, e.g. in a trigger.

Anyway, here is the code:

IF OBJECT_ID('dbo.MyTable') IS NOT NULL DROP TABLE dbo.MyTable;

CREATE TABLE dbo.MyTable(
 Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
 v NUMERIC(5,3) DEFAULT ABS(CHECKSUM(NEWID())%100)/100.0
);


MERGE dbo.MyTable T
USING (SELECT TOP(1000000) 1 X FROM sys.system_internals_partition_columns A,sys.system_internals_partition_columns B,sys.system_internals_partition_columns C,sys.system_internals_partition_columns D)X
ON(1=0)
WHEN NOT MATCHED THEN
INSERT DEFAULT VALUES;

--SELECT * FROM dbo.MyTable

DECLARE @st DATETIME2 = SYSUTCDATETIME();
DECLARE cur CURSOR FAST_FORWARD FOR
  SELECT Id,v FROM dbo.MyTable
  ORDER BY Id;

DECLARE @id INT;
DECLARE @v NUMERIC(5,3);
DECLARE @running_total NUMERIC(6,3) = 0;
DECLARE @bucket INT = 1;

CREATE TABLE #t(
 id INT PRIMARY KEY CLUSTERED,
 v NUMERIC(5,3),
 bucket INT,
 running_total NUMERIC(6,3)
);

OPEN cur;
WHILE(1=1)
BEGIN
  FETCH NEXT FROM cur INTO @id,@v;
  IF(@@FETCH_STATUS <> 0) BREAK;
  IF(@running_total + @v > 1)
  BEGIN
    SET @running_total = 0;
    SET @bucket += 1;
  END;
  SET @running_total += @v;
  INSERT INTO #t(id,v,bucket,running_total)
  VALUES(@id,@v,@bucket, @running_total);
END;
CLOSE cur;
DEALLOCATE cur;
SELECT DATEDIFF(SECOND,@st,SYSUTCDATETIME());
SELECT * FROM #t;

GO 
DROP TABLE #t;

It drops and recreates the table MyTable, fills it with 1000000 rows and then goes to work.

The cursor copies each row into a temp table while running the calculations. At the end the select returns the calculated results. You might be a little faster if you don't copy the data around but do an in-place update instead.

If you have an option to upgrade to SQL 2012 you can look at the new window-spool supported moving window aggregates, that should give you better performance.

On a side note, if you have an assembly installed with permission_set=safe, you can do more bad stuff to a server with standard T-SQL than with the assembly, so I would keep working on removing that barrier - You have a good use case here where CLR really would help you.

Sql-server – Displaying Parent Child Information, With Certain Parent Columns Only Shown Once

A general policy is to let the reporting layer handle things like only printing ParentIncome once. However, since you are delivering a spreadsheet that will be used by others in who knows what manner, then I suppose you are stuck.

Because of the knowledge required you will need to develop some extra information (MIN, MAX, first, last, etc.) that is not known by a single row. There are dodges different from ROW_NUMBER() OVER (PARTITION...), but there will still be an extra step.

See the following:

CREATE TABLE #parent
(ParentNum INT,
 ParentName VARCHAR(20),
 ParentIncome INT);

CREATE TABLE #child
(ChildNum INT,
 ChildParentNum INT,
 ChildName VARCHAR(20),
 ChildAllowance INT);

INSERT INTO #parent VALUES(10,'John',50000);
INSERT INTO #parent VALUES(20,'Jane',55000);
INSERT INTO #parent VALUES(30,'Jackie',90000);

INSERT INTO #child VALUES(1,10,'Johnny',5)
INSERT INTO #child VALUES(2,20,'Jackie',10)
INSERT INTO #child VALUES(3,20,'Billy',5)
INSERT INTO #child VALUES(4,20,'Sally',5)
INSERT INTO #child VALUES(5,30,'Monique',0)

-- Basic approach you may be using

See this example SQL Fiddle #1

SELECT pc.ParentName, pc.ParentNum,
    CASE WHEN pc.RowNum = 1 THEN CAST(pc.ParentIncome AS VARCHAR(10)) ELSE '' END as ParentIncome,
    pc.ChildName, pc.ChildNum, pc.ChildAllowance
FROM (SELECT p.ParentNum, p.ParentName, p.ParentIncome, 
         c.ChildNum, c.ChildParentNum, c.ChildName, c.ChildAllowance,
         ROW_NUMBER() OVER (PARTITION BY ParentNum ORDER BY ParentNum) AS RowNum
       FROM #parent p JOIN #child c ON p.ParentNum = c.ChildParentNum) AS pc
ORDER BY pc.ParentNum, pc.ChildNum 

    -- An alternative, but still using a subselect for one element

See this example: SQL Fiddle #2

SELECT  p.ParentName, p.ParentNum, 
        CASE WHEN c.ChildNum = mc.MinChild THEN CAST (ParentIncome AS VARCHAR(10)) ELSE '' END AS ParentIncome,
        c.ChildName, c.ChildNum, c.ChildAllowance
       FROM #parent p 
          JOIN #child as c 
             ON p.ParentNum = c.ChildParentNum
          -- This subselect gives the MIN (or First) ChildNum per Parent
          JOIN (SELECT ChildParentNum, MIN(ChildNum) AS MinChild
                  FROM #child
                  GROUP BY ChildParentNum) AS mc 
             ON mc.ChildParentNum = c.ChildParentNum 
ORDER BY p.ParentNum, c.ChildNum 

drop table #parent
drop table #child

Notice that I cast the ParentIncome as a VARCHAR(10) so that the datatype would be of the same type as the empty income of ''. I originally used a NULL instead of a blank, but that might give you EXCEL problems.

Is it worth doing things this way? It is up to you, but it primarily depends on what you like best. The ROW_NUMBER() is a more powerful operator than MIN() and gives you more options, but in this case it appears that a MIN() will work for you.

Best Answer

Related Solutions

Sql-server – How to write windowing query which sums a column to create discrete buckets

Sql-server – Displaying Parent Child Information, With Certain Parent Columns Only Shown Once

Related Question