Hope someone can clarify how to do this query from SQL Server 2012, or maybe what it is asking;
it seems I am being asked to do a query that combines aggregates with non-aggregates, leading to what seems like an impossible query, since one can only group by fields that appear in the Select
clause, which messes up
the query.
I am asked to do a query on the table Invoices
:
Invoices (InvoiceID, VendorID,InvoiceDate, InvoiceTotal,...)
I am asked to do a query that produces 6 columns:
3 columns already appear "as is" in the table: VendorID, InvoiceDate, InvoiceTotal
And 3 other columns are aggregates:
4) Sum(InvoiceTotal) AS VendorTotal
: Sum of invoices for each VendorID
5) Count(InvoiceID) AS VendorCount
: Count of Invoices for each VendorId
6) Avg(InvoiceTotal) AS VendorAvg
: Average of Invoices per VendorId
Now, problem is that one cannot combine in the same query (with exceptions I cannot use here) aggregates and non-aggregates unless one groups by the non-
aggregates. So I can do:
Select
VendorId, InvoiceDate, InvoiceTotal,
Sum(InvoiceTotal) AS VendorTotal,
Count(InvoiceId) AS VendorCount,
Avg(InvoiceTotal) AS VendorAvg
GROUP BY
VendorID
Now, I have no trouble if I only group by VendorID
. But in order to get a
valid query, if I select either InvoiceDate
or InvoiceTotal
— both non-aggregates — I must group by each of them in order to get a valid query.
But grouping by these last two basically undoes the other aggregates: If
I group by invoice date, I lose the aggregation by Vendor, since each vendor
has different invoice dates.
I have tried doing self joins, using Invoices AS I1 join Invoices AS I2
and doing aggregates in I1
and non-aggregates on I2
, but this does not seem to work.
Am I missing something obvious here? Any Ideas?
Edit: The answer was found to be using OVER (PARTITION BY VendorId)
, like so:
Select
VendorId, InvoiceDate, InvoiceTotal,
Sum(InvoiceTotal) OVER (PARTITION BY VendorId) AS VendorTotal,
Count(InvoiceId) OVER (PARTITION BY VendorId) AS VendorCount,
Avg(InvoiceTotal) OVER (PARTITION BY VendorId) AS VendorAvg
GROUP BY
VendorID
Best Answer
The solution is what you already found, in DBMS like SQL Server 2005+ that have implemented window functions, we can use them to get aggregates
over
eachpartition
while keeping the original table, so not collapsing the table as agroup by
does. I'm not writing the query here, please edit your answer.In older versions (or other DBMS) that don't have the
OVER ()
clause syntax available, we could useGROUP BY
in a derived table (or a CTE) and then a "self" join back to the original table, like this: