SQL Server – How to Set Rank Per Group with a SQL UPDATE Statement?

ranksql servert-sql

Please first refer to the table structure:

Now you could see there is one unique constraint including manager_code, archive_year and archive_day_of_year.

I need to rank the managers for every group which members having the same branch_code, year and day of year. Here branch_code stands for the department code.

I tried this:

I could get the correct rank number using RANK() on SQL Server but I don't know how to set the correct_rank back into the rank_in_department column using an UPDATE statement on table open_account_by_manager_per_day.

Do you have any idea on how this can be done?

Best Answer

CTE:

You can embed you SELECT with RANK() into a CTE and then UPDATE the CTE.

WITH cte AS
(
    SELECT *, r = RANK() OVER(PARTITION BY archive_day, archive_year, branch_code ORDER BY open_count)
    FROM @data
)
UPDATE c 
SET rank_in_department = r 
FROM cte c;

Don't forget the ; terminator at the end of the line preceding the CTE statement.

Sub Query:

You can also self JOIN your table on a sub query with the expected RANK.

UPDATE d SET rank_in_department = r.r
FROM @data d
INNER JOIN (
    SELECT id
        , r = RANK() OVER(PARTITION BY archive_day, archive_year, branch_code ORDER BY open_count) 
    FROM @data
) r ON d.id = r.id

This query expects an Id or a group of column in both the sub query and the JOIN. It is used to uniquely identify each row and JOIN it to the table. From your data in your sample picture, this seems to be manager_code+, archive_year, archive_day_of_year

Sample Data used:

This gives your 2 correct syntaxes using this sample data. Queries must be adapted to your real table(s).

DECLARE @data TABLE(id int identity(0, 1), archive_year int, archive_day int, branch_code nvarchar(5), rank_in_department int, open_count int)
INSERT INTO @data(archive_day, archive_year, branch_code, open_count) VALUES
    (2016, 1, 'X', 5)
    , (2016, 1, 'X', 15)
    , (2016, 1, 'X', 52)
    , (2016, 1, 'X', 36)
    , (2016, 1, 'X', 55)
    , (2016, 1, 'Y', 65)
    , (2016, 1, 'Y', 85)
    , (2016, 1, 'Y', 42)
    , (2016, 1, 'Y', 96)
    , (2016, 1, 'Y', 15);

SELECT *
    , r = RANK() OVER(PARTITION BY archive_day, archive_year, branch_code ORDER BY open_count)
FROM @data;

Related Solutions

Sql-server – Comparing DISTINCT, GROUP BY and ROW_NUMBER() in SQL Server 2008 with data warehouse workloads

In my experience, an aggregate (DISTINCT or GROUP BY) can be quicker then a ROW_NUMBER() approach. Saying that, ROW_NUMBER is better with SQL Server 2008 than SQL Server 2005.

However, you'll have to try for your situation.
Compare query plans, and use Profiler and SET to capture IO, CPU, Duration etc

For a lot of background, see these SO questions:

Why are logical reads for windowed aggregate functions so high? (follow the links from Martin Smith)
can I get count() and rows from one sql query in sql server? (note the comments in the answers from me and Chris Bednarski)

Finally, do you need the ROW_NUMBER approach? It looks like you're fixing a problem caused by de-normalisation.

And some notes:

shouldn't YearID be in the GROUP BY or PARTITION BY?
Won't DISTINCT give different output?
Are these columns indexed?

Sql-server – Cumulative Game Score SQL

Okay, so here is the query modified to work the way you want:

DECLARE @players table
(
    PlayerID uniqueidentifier NOT NULL PRIMARY KEY,
    PlayerName nvarchar(64) NOT NULL
);

DECLARE @playerScores table
(
    ID bigint NOT NULL IDENTITY PRIMARY KEY,
    PlayerID uniqueidentifier NOT NULL,
    DateCreated datetime NOT NULL,
    Score int NOT NULL,
    TimeTaken bigint NOT NULL,
    PuzzleID int NOT NULL
);

DECLARE @puzzleId int = 0;

SELECT TOP 50
    a.PlayerID,
    p.PlayerName,
    a.Score,
    a.TimeTaken,
    a.PlayedDate
    FROM
    (
        SELECT
            ps.PlayerID,
            ps.Score,
            ps.TimeTaken,
            ps.DateCreated AS PlayedDate,
            ROW_NUMBER()
                OVER
                (
                    PARTITION BY ps.PlayerID
                    ORDER BY ps.Score DESC, ps.TimeTaken, ps.DateCreated
                ) AS RN
            FROM @playerScores ps
            WHERE ps.PuzzleID = @puzzleId
    ) a
    INNER JOIN @players p ON p.PlayerID = a.PlayerID
    WHERE a.RN = 1
    ORDER BY
        a.Score DESC,
        a.TimeTaken,
        a.PlayedDate;

Having written this (note: indexes are not optimized), and looking at the other queries you're going to need to write, what I would actually recommend is to abandon this type of query entirely, and create a denormalized high-score table (rows are unique on the combination of PlayerID, PuzzleID), on which to run aggregates instead.

The reason why is because the GameResult table is going to grow huge in the database, and so it will be less and less efficient to run aggregates on it directly as time passes, and the requirements are incompatible with doing something like creating an indexed view to summarize the information.

Also, if you aren't doing this already, it's highly likely you'll want to use an asynchronous process to compute the "leaderboards" periodically and cache the results, instead of computing them just-in-time. (You could do something like merge the current player's score with the cached leaderboards so the player can see themself on the leaderboards immediately if they got a high score.) See my answer here for some ideas to consider when implementing a caching mechanism.