The Problem
My SQL Server database contains a view that looks like the following:
+-------------+-----------+------+-------+------+------+
| CAR_ID | MAKE |COLOR | Miles | Seats| MODEL|
+-------------+-----------+------+-------+------+------+
| 1 | FORD | RED | 100 | 4 | CAR |
| 2 | TOYOTA | RED | 500 | 2 | TRK |
| 3 | FORD | BLACK| 10 | 4 | CAR |
| 4 | HYUNDAI | BLUE | 150 | 4 | VAN |
| ... | ... | ... | ... | ... | ... |
| 1000 | TOYOTA | GRN | 200 | 8 | CAN |
+-------------+-----------+------+-------+------+------+
I would like to create custom groupings from this table. My desired output might look like the following-
+-------------+-----------+
| Group | sum(Miles)|
+-------------+-----------+
| Red Cars | 500 |
| Red Toyota | 325 |
| Trucks | 563 |
| Black Car | 952 |
| ... | ... |
| Black Ford | 100 |
+-------------+-----------+
Indeed, it looks like a simple "GROUP BY" query. However, in my case, the groups can become rather complex with many logical statements and the number of groups is also very large which makes a "group by" query intractable for me to use. My solution has been to use the WHERE
clause to create custom groups. Some may call this Dynamic SQL.
My Solution
I have a created a C# script which I use to store the group definitions and the query itself. The group definitions are stored in a dictionary. I iterate through the dictionary, passing each grouping to the WHERE
clause of a select statement, and then I send the SQL statement to the SQL server.
The server returns a Datatable with my custom grouping to my C# application. I then Bulk Insert the Datatable into another table on the SQL server. If I have 50 groupings, I query the SQL server 50 times and bulk insert 50 times. Here is an example:
Dictonary x = {GRP1, (COLOR = BLACK or Miles <50),
GRP2, (Seats <5 AND MODEL = CAR)};
FOREACH(key in x)
{
y = "SELECT * FROM CAR_TABLE WHERE " + x[key]
Datatable dt = QueryServer(y);
BulkInsertToDB(dt);
}
Question
There must be a better and more efficient way of doing this. I was thinking about throwing my query into a stored procedure on the server, but I would still have to make calls to the stored procedure for each grouping. I then thought perhaps there is a way to store these groupings server size and run all of the queries concurrently in a stored procedure. Any guidance would be helpful.
Clarifications:
One row could belong to many groups. I have sum of miles and sum of age of vehicles. I would use the key of the dictionary as the group name and pass the key to the SQL Server. I would update the datatable in c# with the group name.
Best Answer
There are a number of ways to do this purely in SQL. They will, logically enough, also require dynamic SQL.
First, copy your dictionary into a table in SQL Server - I'll refer to it as
Group_Def
, with columnsGroup_Name
andWhere_Clause
.Option 1: replicate your logic in SQL. For each group, generate a series of SQL statements (in a variable) that will add the group name and the
CAR_ID
for all cars matching your criteria to a temporary table. Execute the SQL statements in the variable to populate the temp table. Then, join that temporary table toCAR_TABLE
onCAR_ID
, group by Group_Name, and apply your aggregates.Option 2: Grab the aggregate values as you go. This time, instead of writing all the matching
CAR_ID
values into the temp table, write the actual aggregate values. Then, simply select from the temp table.May be easier to follow if you see the code:
Option 2 may be slightly faster, as you aren't having to calculate the aggregates in the final
SELECT
. Note, however, that even though I've built@sqlcmd
as a single SQL statement, theUNION ALL
is joining together functionally separate queries. Performance-wise, you could rewrite this so that each row in#Group_Def
was generating a distinctINSERT
statement, and performance would not change significantly. You're basically running one statement for each group, regardless.I can envision a third option that would collect all the aggregates in a single step; however, it's not as scalable, and requires that all your aggregates can be expressed in terms of sums or counts, and still ultimately requires a
SELECT
for each group.You would build a single
SELECT
statement, with columns for each value you want to record for each group. You'd want to add a version of the group name without spaces. To collect the aggregates, you would useSUM
andCASE
statements. For example:(No
GROUP BY
would be necessary - all output columns would be aggregates).Then, you build the
SELECT
statements to pull all the values you need out of the#single_row_total
table (again, with dynamic SQL, going through theGroup_Def
table to build the statement), usingUNION ALL
to tie them together:This would allow all the aggregates to be collected in a single set-based operation. As noted, breaking the single result into separate rows still requires one
SELECT
per group; however, these selects, being on a single row, should be quite quick.This isn't scalable because there's an upper limit to the number of columns you can have in a single table, so a sufficient number of groups, times a sufficient number of aggregates needed per group, could lead to too many columns needed.
Also, while the query pulling the aggregates is a single SQL query, the process of evaluating each row to determine what the results of the
CASE
statement would be could cause the execution plan to be so hideous, that the oneSELECT
per group options would perform as well or better.So, I leave the creation of Option 3 (as well as testing with much larger data sets, to determine the fastest option) up to the OP, or another interested party.
As to the dynamic SQL aspect of this: unless the groups can be defined on an ad hoc basis, you're building your SQL statements with no user-supplied values, so there's no risk of SQL injection. And (I think), if
#Group_Def
doesn't change, the plan generated might wind up being reusable (not sure, not planning on checking; wasn't part of the original question).