SQL Server Query – Mixing Aggregates and Non-Aggregates

sql serversql-server-2012

Hope someone can clarify how to do this query from SQL Server 2012, or maybe what it is asking;
it seems I am being asked to do a query that combines aggregates with non-aggregates, leading to what seems like an impossible query, since one can only group by fields that appear in the Select clause, which messes up
the query.

I am asked to do a query on the table Invoices:

Invoices (InvoiceID, VendorID,InvoiceDate, InvoiceTotal,...)

I am asked to do a query that produces 6 columns:
3 columns already appear "as is" in the table: VendorID, InvoiceDate, InvoiceTotal

And 3 other columns are aggregates:

4) Sum(InvoiceTotal) AS VendorTotal: Sum of invoices for each VendorID

5) Count(InvoiceID) AS VendorCount: Count of Invoices for each VendorId

6) Avg(InvoiceTotal) AS VendorAvg: Average of Invoices per VendorId


Now, problem is that one cannot combine in the same query (with exceptions I cannot use here) aggregates and non-aggregates unless one groups by the non-
aggregates. So I can do:

Select 
    VendorId, InvoiceDate, InvoiceTotal, 
    Sum(InvoiceTotal) AS VendorTotal, 
    Count(InvoiceId) AS VendorCount, 
    Avg(InvoiceTotal) AS VendorAvg 
GROUP BY
    VendorID 

Now, I have no trouble if I only group by VendorID. But in order to get a
valid query, if I select either InvoiceDate or InvoiceTotal — both non-aggregates — I must group by each of them in order to get a valid query.

But grouping by these last two basically undoes the other aggregates: If
I group by invoice date, I lose the aggregation by Vendor, since each vendor
has different invoice dates.

I have tried doing self joins, using Invoices AS I1 join Invoices AS I2
and doing aggregates in I1 and non-aggregates on I2, but this does not seem to work.

Am I missing something obvious here? Any Ideas?

Edit: The answer was found to be using OVER (PARTITION BY VendorId) , like so:

Select 
VendorId, InvoiceDate, InvoiceTotal, 
    Sum(InvoiceTotal) OVER (PARTITION BY VendorId) AS VendorTotal, 
    Count(InvoiceId) OVER (PARTITION BY VendorId) AS VendorCount, 
    Avg(InvoiceTotal) OVER (PARTITION BY VendorId) AS VendorAvg 
GROUP BY
    VendorID 

Best Answer

The solution is what you already found, in DBMS like SQL Server 2005+ that have implemented window functions, we can use them to get aggregates over each partition while keeping the original table, so not collapsing the table as a group by does. I'm not writing the query here, please edit your answer.

In older versions (or other DBMS) that don't have the OVER () clause syntax available, we could use GROUP BY in a derived table (or a CTE) and then a "self" join back to the original table, like this:

SELECT 
    i.VendorID, i.InvoiceDate, i.InvoiceTotal, 
    grp.VendorTotal, 
    grp.VendorCount, 
    grp.VendorAvg 
FROM
      dbo.Invoices AS i             -- our original table
  JOIN
      ( SELECT 
            VendorID,
            Sum(InvoiceTotal) AS VendorTotal, 
            Count(InvoiceId) AS VendorCount, 
            Avg(InvoiceTotal) AS VendorAvg 
        FROM 
            dbo.Invoices  
        GROUP BY
            VendorID 
      ) AS grp                      -- the aggregated table 
  ON 
      grp.VendorID = i.VendorID ;