Sql-server – Deleting Data Takes A Long Time

deleteperformancequery-performancesql server

I want to delete data from a table that contains millions of rows.

As a test I tried to delete 1 row but it takes over 2mins.

DELETE TOP (1)
    FROM [geneva.reports.model].PriceDiffs
    WHERE UserID = 384166

There is no index on UserID but there are 3 non-clustered indexes. I tries disabling them but they failed due to lock timeout.

The actual query plan is uploaded to Paste the Plan.

Best Answer

I can think of two possible explanations for why you experience poor performance with that query.

The first relates to blocking. Have you checked the wait events while the query is running? If it is blocked by another session and waiting for a lock then you don't have a performance problem. Instead, you have a concurrency problem.

The second relates to the table and index structure on the table that you are deleting from. The PriceDiffs table has 194807000 rows. If you don't have an index that can seek on the UserID column it could take a long time to find the single row to delete. There's also a scan on the PriceDifferenceRecords table which I assume is there to enforce a constraint or a foreign key, but that table only has 15843 rows.

You might be wondering why the query takes two minutes to run when it has a relatively low estimated subtree cost of 0.0569578. This is probably due to the row goal introduced to the query by your use of TOP 1. SQL Server scans the IX_PriceDiffs_PriceDate index to find a matching row but the filter is implemented as predicate instead of a seek predicate. I assume this is because UserID is not a key column. Thus, SQL Server scans the index in order until it finds a row where UserID = 384166. If you are really lucky the first matching row could be in the first page of the index. If you are really unlucky the first matching row could be in the last page of the index, so your DELETE query would scan the entire table to delete just one row.

It's worth noting that deleting just one row from a table often isn't a good way to test performance when you need to delete lots of rows. For a starting point, consider creating an index on UserId and deleting in batches.

Related Solutions

Sql-server – Troubleshooting extremely long running query (SQL Server 2012)

After some excellent help from Kin, wBob, and Aaron Bertrand, I finally found my root cause: Table1 statistics needed updating.

I had already confirmed that statistics were updated 2 days prior during our weekly index maintenance, so didn't really think anything of it. However, I gave it a try, and that made the query run time go from 2hrs 30min down to 2 seconds!

Command run was:

UPDATE STATISTICS Table1;

Thanks for all the help, particularly in the article "The "Runaway" Query"

SQL Server – Clustered Index Not Used in Delete Statement

I'd like to know why the optimizer does not use the clustered index, but is using the non-clustered one?

This will be a decision of the cost based optimizer. It estimates that it is cheaper to fully scan the narrow index.

It seems that you were expecting a nested loops with seeks on the clustered index?

The execution plan shows that the table #ToPurge has 3.9 million rows.

This will take 195 batches to clear out. Assuming that all rows in #ToPurge are unique and initially exist in Production_Detail and that the rows output from #ToPurge are in a consistent order for each batch (I'm ignoring the parallelism) then each successive batch will end up seeking all rows that have already been deleted in previous batches before finally getting to a patch of rows that have not yet been processed then in total your delete process will do more than a third of a billion index seeks for already deleted rows.

0 + 20,000 + 40,000 + ... + 3,880,000 = 378,300,000

This isn't the logic that the cost based optimizer uses (which will involve histograms of the two tables and a rowgoal for the TOP 20000) but it shows this is not a good strategy anyway.

You could try the following to cap the number of seeks at 20,000 per iteration and see if you now get your desired plan.

CREATE TABLE #ToPurge
  (
     [BatchId] [INT] NULL,
     [Id]      [BIGINT] NOT NULL,
     UNIQUE CLUSTERED([BatchId], [Id])
  );

/*Your insert statement to #ToPurge here*/

WITH CTE
     AS (SELECT *,
                1 + ( ROW_NUMBER() OVER (ORDER BY Id) / 20000 ) AS _BatchId
         FROM   #ToPurge)
UPDATE CTE
SET    [BatchId] = [_BatchId];

DECLARE @BatchCounter INT = 1;

WHILE @BatchCounter <= (SELECT MAX([BatchId])
                        FROM   #ToPurge)
  BEGIN
      DELETE pd
      FROM   #ToPurge tp
             INNER JOIN [DataWarehouse].[dbo].[Production_Detail] pd
               ON ( tp.ID = pd.ID )
      WHERE  tp.BatchId = @BatchCounter

      SET @BatchCounter+=1;
  END

Best Answer

Related Solutions

Sql-server – Troubleshooting extremely long running query (SQL Server 2012)

SQL Server – Clustered Index Not Used in Delete Statement

Related Question