SQL Server – Improving Query Performance with Window Functions

sortingsql serverwindow functions

We write a query that include unpivot, partition by and order by.
Query is:

SELECT PersonId
    ,SalaryDate
    ,ID
    ,Type
    ,SalaryValue
    ,ROW_NUMBER() OVER (
        PARTITION BY PersonId ORDER BY SalaryValue
        ) AS rn
FROM (
    SELECT lp.PersonId
        ,lp.SalaryDate
        ,lp.Salary1
        ,lp.Salary2
        ,lp.Salary3
        ,lp.ID
    FROM rdd.Salaries AS lp WITH (NOLOCK)
    WHERE lp.SalaryDate > DATEADD(day, - 31, getdate())
    ) AS t
unpivot(SalaryValue FOR Type IN (
            lp.Salary1
            ,lp.Salary2
            ,lp.Salary3
            )) AS UnpivotTable

enter image description here

The query returns about 68.000.000 rows and execution time is 20 minutes.

Can I improve the query's performance or rewrite effectively?
What is the alternative of partition by?

Best Answer

You may find that the following index and query rewrite performs better, because it sorts per person rather than once over the whole set, and row estimates are more likely to be accurate:

-- Index
CREATE INDEX IX_Salaries_PersonId_SalaryDate_Inc_ID_Salary1_Salary2_Salary3
ON rdd.Salaries (PersonId, SalaryDate)
INCLUDE (ID, Salary1, Salary2, Salary3);

-- Query
WITH People AS
(
    SELECT DISTINCT
        S.PersonId
    FROM rdd.Salaries AS S
    WHERE 
        S.SalaryDate > DATEADD(DAY, -31, GETDATE())
)
SELECT 
    P.PersonId, 
    CA.SalaryDate, 
    CA.ID, 
    CA.SalaryValue, 
    CA.rn
FROM People AS P
CROSS APPLY
(
    SELECT
        S.SalaryDate, 
        S.ID, 
        V.SalaryValue, 
        rn = ROW_NUMBER() OVER (ORDER BY V.SalaryValue)
    FROM rdd.Salaries AS S
    CROSS APPLY
    (
        SELECT S.Salary1 WHERE S.Salary1 IS NOT NULL
        UNION ALL
        SELECT S.Salary2 WHERE S.Salary2 IS NOT NULL
        UNION ALL
        SELECT S.Salary3 WHERE S.Salary3 IS NOT NULL
    ) AS V (SalaryValue)
    WHERE 
        S.PersonId = P.PersonId
        AND S.SalaryDate > DATEADD(DAY, -31, GETDATE())
) AS CA
ORDER BY
    P.PersonId,
    CA.rn
OPTION (QUERYTRACEON 8649);

You can omit the OPTION clause if you find a parallel query is generated naturally, or if you find non-parallel performance is good enough. The desired plan shape is roughly as follows:

Plan Shape

Related Solutions

Sql-server – Window functions cause awful execution plan when called from a view with external parametrized ‘where’ clause

This appears to be a long standing issue that keeps resurfacing in one form or another and is still present in SQL Server 2012.

Some posts discussing it are

All current versions of SQL Server up to and including 2012 are not able to push the filter on a partitioning group past the sequence project for a parameterised predicate except if option(recompile) is used (if 2008+).

An alternative to the recompile hint would be to rewrite the query to use a parameterised inline TVF as suggested by @a1ex07)

Postgresql – Working of window functions and idea window size for window function

The elementary difference is that window functions are applied to all rows in a result set to compute additional columns after the rest of the result set has been determined. No row is dropped. They are available since PostgreSQL 8.4.

The LIMIT and OFFSET clauses of the SELECT command on the other hand do not compute additional columns. They just pick a certain "window" of rows from the result set (in cooperation with the ORDER BY clause) and discard the rest. Have been there like for ever.

While certain tasks can be tackled with either of these tools, they are very different in nature.

For your simple task

sorting data on date and then bring the latest data first

you don't need either of them. Just add:

ORDER BY date_col DESC

According to your comment, you would need:

SELECT col1, col2
FROM   tbl
ORDER  BY date_col DESC
LIMIT  100   -- 100 latest rows
OFFSET 0;    -- just noise, but may be easier to code

Retrieve more:

...
LIMIT  100
OFFSET 100;  -- next page of 100 rows ...

Be sure to have an index on date_col in either case!

Best Answer

Related Solutions

Sql-server – Window functions cause awful execution plan when called from a view with external parametrized ‘where’ clause

Postgresql – Working of window functions and idea window size for window function

Related Question