Sql-server – Reorder records in table

performancequery-performancesortingsql serverupdate

I am looking for a SQL solution to reorder records by updating order column.

Consider following data structure:

------------------------------------
| id | type | value | order | user |
------------------------------------
|  1 |    A |   123 |     1 |    1 |
|  2 |    B |   231 |     2 |    1 |
|  3 |    B |   213 |     3 |    2 |
|  4 |    A |   222 |     4 |    2 |
|  5 |    A |    22 |     5 |    3 |
|  6 |    B |    11 |     6 |    4 |
|  7 |    C |    99 |     7 |    1 |
------------------------------------

Software(non-modifiable) that uses this data, takes order in account and applies value with max order for each user.
However I need that type A records are applied first (have max order for particular users entries).
Order column is unique.

Current solution is to create temporary table, selecting data in desired order and adding new identify column. Effectively rebuilding whole order column. Like this:

BEGIN TRANSACTION;

-- first we take type C
SELECT id
    ,type
    ,value
    ,[ORDER]
    ,user
INTO #tmp
FROM records
WHERE type = C;

-- we add new identify column
ALTER TABLE #tmp ADD new_order INT identify (1,1);

INSERT INTO #tmp
SELECT id
    ,type
    ,value
    ,[ORDER]
    ,user
FROM records
WHERE type = B;

-- last we take type A, these will have largest `order` values
INSERT INTO #tmp
SELECT id
    ,type
    ,value
    ,[ORDER]
    ,user
FROM records
WHERE type = A;

-- old update method
--update records set records.order = #tmp.new_order from records
--join #tmp on #tmp.id = records.id
COMMIT TRANSACTION;

-- new method thanks @Lennart
MERGE INTO records x
USING #tmp
    ON #tmp.id = records.id
WHEN MATCHED
    THEN
        UPDATE
        SET records.
        ORDER = #tmp.new_order;

DROP TABLE #tmp;

However this solution locks the table for too long, which is not acceptable during business hours.

Is it possible and effective to split this per user?

Take all records of user 1
Reorder using only order values of given records (not sure how to do this)
Update records table
Repeat for next user

I understand that the whole process will take much longer, but rest of the table will not be locked for other operations?

I am looking for a SQL solution.

UPDATE

I believe I did not elaborate enough.

Column "order" is unique. I am interested if there is a SQL way to:

Get all records for each user. e.g.:
select * from records where user = 1;
Sort records for each user using values available.
In case of user=1 we have only values (1,2,7) to operate with.
Update order column in records table

Desired result for give example:

------------------------------------
| id | type | value | order | user |
------------------------------------
|  1 |    A |   123 |     7 |    1 |
|  2 |    B |   231 |     2 |    1 |
|  3 |    B |   213 |     3 |    2 |
|  4 |    A |   222 |     4 |    2 |
|  5 |    A |    22 |     5 |    3 |
|  6 |    B |    11 |     6 |    4 |
|  7 |    C |    99 |     1 |    1 |
------------------------------------

UPDATE #2:
This is what I theoretically meant. However in practice this will lock the table for much longer periods than current solution and, over time, make all operations with the table slower.

DECLARE @curr_usr INT

DECLARE curr CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR
SELECT DISTINCT user
FROM records

-- foreach user
OPEN curr

FETCH NEXT
FROM curr
INTO @curr_usr

WHILE @@fetch_status = 0
BEGIN
    -- get user entries
    SELECT *
        ,row_number() OVER (
            ORDER BY type
            ) AS new_order
    INTO #tmp
    FROM records
    WHERE user = @curr_usr

    -- sort available order values for given record set
    MERGE INTO #tmp
    USING (
        SELECT oder
            ,row_number() OVER (
                ORDER BY
                [ORDER]
                ) AS order_seq
        FROM records
        WHERE user = @curr_usr
        ) x
        ON x.order_seq = #tmp.new_order
    WHEN MATCHED
        THEN
            UPDATE
            SET #tmp.
            [ORDER] = x.
            [ORDER];

    -- update main table
    MERGE INTO records
    USING #tmp
        ON #tmp.id = records.id
    WHEN MATCHED
        THEN
            UPDATE
            SET records.
            [ORDER] = #tmp.
            [ORDER];

    FETCH NEXT
    FROM curr
    INTO @curr_usr

    TRUNCATE TABLE #tmp
END

CLOSE curr

UPDATE #3 Some clarification about data amount and time

There are roughly 600k records for ~10k users.
solution #1 (stated in question) takes ~ 29 seconds
solution adapted from @lennart answer takes ~ 27 seconds
solution adapted from @daniel answer takes ~ 26 seconds

This is test data, with no users and services working on it, sorter being the only one.
I believe that achieving <10 seconds would be enough for clients not to notice the lag.

It appears that I will have to adapt @daniel solution, limited to user where user = ? and call this for each user separately with delay between calls, this way clients should not notice any delays, right?

Best Answer

Table

CREATE TABLE dbo.Records
(
    id integer NOT NULL
        CONSTRAINT [PK dbo.Records id]
        PRIMARY KEY CLUSTERED (id),
    [type] character(1) NOT NULL,
    [value] integer NOT NULL,
    [order] smallint NOT NULL,
    [user] integer NOT NULL,

    -- order is unique
    CONSTRAINT [UQ dbo.Records order]
        UNIQUE NONCLUSTERED ([order] DESC)
);

Indexes

-- type is unique per user, useful to include order
CREATE UNIQUE INDEX [UQ dbo.Records user, type (order)]
ON dbo.Records ([user], [type])
INCLUDE ([order]);

-- useful index on user, order
CREATE NONCLUSTERED INDEX [UQ dbo.Records user, order]
ON dbo.Records ([user] ASC, [order] DESC);

Sample data

INSERT dbo.Records
    (id, [type], [value], [order], [user])
VALUES
    (1, 'A', 123, 1, 1),
    (2, 'B', 231, 2, 1),
    (3, 'B', 213, 3, 2),
    (4, 'A', 222, 4, 2),
    (5, 'A',  22, 5, 3),
    (6, 'B',  11, 6, 4),
    (7, 'C',  99, 7, 1);

Solution

The following is similar to other answers in outline (joining two sets of numbered rows), but it optimizes the join (avoiding nested loops), and avoids changing rows where the order is not changed by the application of the algorithm. Comments inline:

WITH
    Existing AS
    (
        -- Number existing rows in order, per user
        SELECT
            R.[user],
            R.[order],
            rn = ROW_NUMBER() OVER (
                PARTITION BY R.[user] 
                ORDER BY R.[order] DESC)
        FROM dbo.Records AS R 
    ),
    New AS
    (
        -- Number existing rows by type, per user
        SELECT
            R.[user],
            R.[order], 
            rn = ROW_NUMBER() OVER (
                PARTITION BY R.[user] 
                ORDER BY R.[type] ASC)
        FROM dbo.Records AS R 
    ),
    OptNew AS
    (
        -- Help the optimizer see that New.rn is unique per user
        SELECT
            New.[user],
            [order] = MAX(New.[order]), -- Meaningless, but required
            New.rn
        FROM New
        GROUP BY 
            New.[user], 
            New.rn
    )
UPDATE E
SET [order] = N.[order]
FROM Existing AS E
JOIN OptNew AS N
    ON N.[user] = E.[user]
    AND N.rn = E.rn
WHERE
    -- Only update if changed
    N.[order] <> E.[order];

db<>fiddle

Execution plan

Related Solutions

Sql-server – Updating a table with more than 850 million rows of data

To speed things up, you could try

Adding a primary key to #CSpec.RowID so you don't scan it every iteration
Change the CTE to a temp table with suitable PK. See next point too
Add an index on SourceTable1 to match the CTE WHERE clause: currently the PK will be scanned, meaning all SourceTable1 rows will be scanned every iteration. All 52 million rows
SourceTable2.MarketID also does not have an index, but I wouldn't worry about this because it is scanned once only (as I understand it)

The query plans here should show a lot of scans because you have poor indexes for the operations you are doing.

Target table indexing appears OK

Another observation: uniqueidentifier and varchar are bad choices for clustered indexes (your PKs here): too wide, not increasing, overhead of collection comparisons at least

Edit, another observation (thanks to @Marian)

Your clustered index is wide generally. Every non-clustered index points to the clustered index, which means a huge NC index too

You could probably achieve the same result by reordering the clustered PK.

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

This is the separate DELETE operation I had in mind:

DELETE m
FROM dbo.Mapping AS m
WHERE EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID)
AND NOT EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID AND RightID = m.RightID);

As I outline here, for a left anti-semi join, the NOT EXISTS pattern will often outperform the LEFT JOIN / NULL pattern (but you should always test).

Not sure if your overall goal is clarity or performance, so only you can judge if this will work out better for your requirements than the NOT MATCHED BY source option. You'll have to look at the plans qualitatively, and the plans and/or runtime metrics quantitatively, to know for sure.

If you expect your MERGE command to protect you from race conditions that would happen with multiple independent statements, you better make sure that is true by changing it to:

MERGE dbo.Mapping WITH (HOLDLOCK) AS target

(From Dan Guzman's blog post.)

Personally, I would do all of this without MERGE, because there are unresolved bugs, among other reasons. And Paul White seems to recommend separate DML statements as well.

And here's why I added a schema prefix: you should always reference objects by schema, when creating, affecting, etc.