T-sql – Combining previously different rows after changing value during query

casegroup bysubstringt-sql

I'm currently running a query that is supposed to return a salesperson's aggregated revenue, grouped by individual clients.

Because of how user input was handled, there are a lot of near-duplicate rows with minor mispellings/additions to client names. As you can see in the code below, I'm trying to trim off one of the problem strings in order to prevent duplicate rows for the same clients.

However, though the CASE statement successfully removes "@iraPrefix", the rows with now-identical client names are still not grouped together because their revenue fields are not being summed.

How should I approach this?

DECLARE @iraPrefix VARCHAR(7)
DECLARE @iraLen INT
SET @iraPrefix = 'FOO BAR '
SET @iraLen = LEN(@iraPrefix)

SELECT [client] = (
           CASE SUBSTRING([name], 1, @iraLen)
               WHEN @iraPrefix THEN SUBSTRING([name], @iraLen+2, (LEN([name]) - @iraLen))
               ELSE [name]
           END
       )
       , CONCAT( sal.firstname
               , ' '
               , sal.lastname) AS [salesman]
       , [revenue] = SUM(net)

Here's an example of the query results without prefix removal:

enter image description here

An example of the current query results:

enter image description here

And an example of the results I want:

enter image description here

Best Answer

Just aggregate one more time.

You can do this by

  • Materialising your intermediate resultset (SELECT ... INTO #intermediate FROM...) and then aggregating the cached intermediate results
  • Wrapping your current query in a set of brackets and SELECTing against whatever alias you assign to it.

See this fiddle for an example. But for verbosity's sake, assuming your data is all stored in a table called barfoo, then one way you could do this using a Common Table Expression (or CTE) is...

DECLARE @iraPrefix VARCHAR(7);
DECLARE @iraLen INT;
SET @iraPrefix = 'FOO BAR ';
SET @iraLen = LEN(@iraPrefix);

WITH L1 AS (
   SELECT [client] = (
           CASE SUBSTRING([name], 1, @iraLen)
               WHEN @iraPrefix THEN SUBSTRING([name], @iraLen+2, (LEN([name]) - @iraLen))
               ELSE [name]
           END
       )
       , CONCAT( sal.firstname
               , ' '
               , sal.lastname) AS [salesman]
       , [revenue] = SUM(net)
   FROM barfoo as sal
   GROUP BY [name],CONCAT( sal.firstname
                      , ' '
                      , sal.lastname) 
)
SELECT 
        [client],
        [salesman],
        SUM([revenue]) AS [revenue]
FROM L1
GROUP BY [client],
        [salesman];