SQL Server Window Functions – Conditional Window Function with a Twist

sql serverwindow functions

Asking for a friend who uses SQL Server Parallel Warehouse.

They™ has a table of weekly sales amounts, like so (please forgive improper column justification):

+-------+------------+
| week  |  amount    |
+-------+------------+
|    1  | 100.00     |
|    2  | 100.00     |
|    3  | 100.00     |
|    4  | 100.00     |
|    5  | 100000.00  |
|    6  | 100.00     |
|    7  | 50000.00   |
|    8  | 50000.00   |
|    9  | 50000.00   |
|   10  | 100.00     |
+-------+------------+

And also a list of "bad" weeks, e.g.

+------+
| week |
+------+
|    5 |
|    7 |
|    8 |
|    9 |
+------+

And They™ needs to select for each week, including "bad" weeks, the sum of sales for the four preceding not "bad" weeks, i.e. going as far back as possible, skipping "bad" week records, to add up at most four sales amounts. So the expected result would be:

+-------+------------+
| week  | sum_not_bad|
+-------+------------+
|    1  | null       |
|    2  | 100.00     |
|    3  | 200.00     |
|    4  | 300.00     |
|    5  | 400.00     |
|    6  | 400.00     |
|    7  | 400.00     |
|    8  | 400.00     |
|    9  | 400.00     |
|   10  | 400.00     |
+-------+------------+

I have a fiddle that has I think one step in the right direction but I can't figure out the next step(s).

Does anyone have insights?

Best Answer

Here's one way, it uses PARTITION to group all the good weeks together and get the cumulative good count for the preceding 4 good weeks. Then an approach along the lines of the Solution 2 Using Concatenation here to work around the lack of support for LAST_VALUE ignoring NULLs and cascade down the previous "good" value.

It keeps track of two cumulative sums. One including the current row (used by the next row if the "next" row is bad) and one without the current row.

WITH T
     AS (SELECT d.week,
                d.amount,
                CASE WHEN b.week IS NULL THEN 0 ELSE 1 END AS is_bad_week,
                SUM(CASE WHEN b.week IS NULL THEN d.amount END)
                  OVER ( PARTITION BY CASE WHEN b.week IS NULL THEN 0 ELSE 1 END ORDER BY d.week rows BETWEEN 4 PRECEDING AND 1 PRECEDING) cume_sum_prev4toprev1,
                SUM(CASE WHEN b.week IS NULL THEN d.amount END)
                  OVER ( PARTITION BY CASE WHEN b.week IS NULL THEN 0 ELSE 1 END ORDER BY d.week rows BETWEEN 3 PRECEDING AND CURRENT ROW) cume_sum_prev3tocurrent
         FROM   data d
                LEFT JOIN bad_weeks b
                       ON d.week = b.week)
SELECT week,
       CASE WHEN is_bad_week = 1 THEN 
       CAST(SUBSTRING(MAX(RIGHT(CONCAT('0000000000', week), 10) + CAST(cume_sum_prev3tocurrent AS VARCHAR(20))) OVER (ORDER BY week), 11, 20) AS DECIMAL(20, 2)) 
       ELSE
       CAST(SUBSTRING(MAX(RIGHT(CONCAT('0000000000', week), 10) + CAST(cume_sum_prev4toprev1 AS VARCHAR(20))) OVER (ORDER BY week), 11, 20) AS DECIMAL(20, 2))
       END AS sum_not_bad
FROM   T
ORDER  BY week

Related Solutions

SQL Server – Improving Query Performance with Window Functions

You may find that the following index and query rewrite performs better, because it sorts per person rather than once over the whole set, and row estimates are more likely to be accurate:

-- Index
CREATE INDEX IX_Salaries_PersonId_SalaryDate_Inc_ID_Salary1_Salary2_Salary3
ON rdd.Salaries (PersonId, SalaryDate)
INCLUDE (ID, Salary1, Salary2, Salary3);

-- Query
WITH People AS
(
    SELECT DISTINCT
        S.PersonId
    FROM rdd.Salaries AS S
    WHERE 
        S.SalaryDate > DATEADD(DAY, -31, GETDATE())
)
SELECT 
    P.PersonId, 
    CA.SalaryDate, 
    CA.ID, 
    CA.SalaryValue, 
    CA.rn
FROM People AS P
CROSS APPLY
(
    SELECT
        S.SalaryDate, 
        S.ID, 
        V.SalaryValue, 
        rn = ROW_NUMBER() OVER (ORDER BY V.SalaryValue)
    FROM rdd.Salaries AS S
    CROSS APPLY
    (
        SELECT S.Salary1 WHERE S.Salary1 IS NOT NULL
        UNION ALL
        SELECT S.Salary2 WHERE S.Salary2 IS NOT NULL
        UNION ALL
        SELECT S.Salary3 WHERE S.Salary3 IS NOT NULL
    ) AS V (SalaryValue)
    WHERE 
        S.PersonId = P.PersonId
        AND S.SalaryDate > DATEADD(DAY, -31, GETDATE())
) AS CA
ORDER BY
    P.PersonId,
    CA.rn
OPTION (QUERYTRACEON 8649);

You can omit the OPTION clause if you find a parallel query is generated naturally, or if you find non-parallel performance is good enough. The desired plan shape is roughly as follows:

Plan Shape

SQL Server 2014 – Using DISTINCT in Window Function with OVER

Anyone know what is the problem? Is such as kind of query possible in SQL Server?

No it isn't currently implemented. See the following connect item request.

OVER clause enhancement request - DISTINCT clause for aggregate functions

Another possible variant would be

SELECT M.A,
       M.B,
       T.A_B
FROM   MyTable M
       JOIN (SELECT CAST(COUNT(DISTINCT A) AS NUMERIC(18,8)) / SUM(COUNT(*)) OVER() AS A_B,
                    B
             FROM   MyTable
             GROUP  BY B) T
         ON EXISTS (SELECT M.B INTERSECT SELECT T.B)

the cast to NUMERIC is there to avoid integer division. The reason for the join clause is explained here.

It can be replaced with ON M.B = T.B OR (M.B IS NULL AND T.B IS NULL) if preferred (or simply ON M.B = T.B if the B column is not nullable).

Best Answer

Related Solutions

SQL Server – Improving Query Performance with Window Functions

SQL Server 2014 – Using DISTINCT in Window Function with OVER

Related Question