Sql-server – Should I lock the table during delete

sql server

I have a table that has 4 million records. It has a clustered index on a date column (record creation date). It has 5 tables reference this table, all have FK indexes.

The machine has no down time. I had a program that clean up records that older than 31 days. It create a connection, Delete TOP 1000 rows, close the connection, and repeat until all old records are removed.

The delete has been very slow, it is deleting about 1000 rows per 10 sec. Ideally I want to do 1000 row per a second.

I notice that during the delete, it is performing a lot of page lock on the index.

I am wondering if there are faster way to delete data without causing timeout.

My idea is that would it be better if I do a table lock, perform the delete, wait for it for a sec so that it doesn't timeout other transaction, then perform the delete again.
My guess is that if I do a table lock, it should reduce the number of row locks or page locks, which may speed up the delete.

Any suggestions on this issue I have would help.

Please note that the harddrive or database isn't fragmented, and it is a RAID 10 machine.

[Update]
Thanks for asking for the performance execution plan. It looks like the live environment is different than my development environment. It is doing index scan rather than index seek. I think I have to investigate more about why it would do a index scan.
Estimated Execution Plan

[Update 2] Here is the index that we have for some of those tables. Our index naming convention is [TableName]_[ColumnName], sorry we didn't use MSSQL naming standard. In addition, it turns out that client has a 96% fragmented index (VehicleLocationTP_VehicleLocationKey), that definitely is one of the problems. It may be a reason why SQL2005 would use index scan, rather than index seek.
enter image description here

[Update 3] I finally able to test the delete query on their testing server, instead on my own computer. They are running SQL 2005 Standard, verse SQL 2008 R2 Express on my machine. The indexes where about 95% fragmented, and rebuilding the index has improved the delete from 25-50%. It is hard to do performance test when the their SQL Server is constantly running. However, checking the actual execution plan, it is the same as the estimated one. So you are right that fragmentation doesn't affect the execution plan. My guess is that it could be the number of rows in the table. Maybe if the table is small, it would uses index scan, rather than index seek.

In addition, this article give me a bit more insight on why it is a index scan Index Scan vs Index Seek

When the execution plan showing index scan, it really scanning the entire table. It calls it a index scan because VehicleLocationAPC is a cluster indexed table. This remove a bit of confusion. It means that index weren't use, it was doing an entire table scan.

Another thing to realize is that the content of the data in VehicleLocationAPC. VehicleLocationKey are almost unique all the time. Our application generation one VehicleLocationAPC row per VehicileLocation row. My guess is that because of this, SQL Server would rather scan the entire table, instead of using the index… but I could be wrong, because I would have thought that the index are sorted in a b-tree, which should be faster to scan the key, rather than doing a table scan.

My focus turns to VehicleLocationTP, this is the table that is causing 63% of the estimated time, and this table is huge.

Best Answer

Changing the locking to table locks will just make the deletes run even slower as the delete won't be able to run until the lock can be taken on the table which means that all other threads need to be finished or blocked. If you have foreign keys with delete cascade enabled that will probably take a lot of the time.

You might want to change it to a SQL Agent job so instead of running your app which connects and disconnects, you just run a loop deleting data until you are done.

SELECT NULL --Makes the WHILE loop work.
WHILE @@ROWCOUNT <> 0
BEGIN
    DELETE TOP (1000) FROM YourTable
    WHERE Column < getdate()-31
END

If this doesn't work you could look into table partitioning which would allow you to switch the data to another table very quickly, then truncate the data from the new table. This however does require Enterprise Edition.

Related Solutions

SQL Server – Clustered Index Scan for Self-Referencing FK Cascade Delete

It needs to validate that the row you are trying to delete is not a parent of an existing row.

You don't have an index on ParentTestId.

So it must do the scan.

CREATE NONCLUSTERED INDEX ix ON  [dbo].[Test](ParentTestId)

Then you see a seek.

BTW: The 20% estimated cost of the scan is likely to be an underestimate in this case.

The FK validation is under a left semi join and SQL Server costs it as though only a partial scan will be needed and it will find a matching row and the delete will fail.

Presumably the rows you are actually deleting will succeed more often than not and so a full scan will be required in order to validate that there are no conflicting rows.

Using trace flag 4138 to turn off row goals

DELETE FROM dbo.Test
WHERE  TestId = 200 
OPTION (querytraceon 4138 )

The re-costed plan shows the CI scan at 100% rather than 20% (as it now assumes a full scan will be needed)

This difference in estimated cost is sufficient for the missing index suggestion to show up.

The costs shown in this plan are still not very representative however. You might notice that they add up to 219%.

Also the overall plan cost of the queries with and without the trace flag are both identical at 0.0168268. The full CI scan ought, in fact, to be costed at 0.152373 (0.0485075 + 0.103866)

enter image description here

but it seems to be capped at no more than the original plan cost (and the overall plan cost doesn't get adjusted upwards either hence incorrect percentages)

SQL Server Index Tuning – Why Optimizer Chooses Table Scan Over Index Seek

The optimizer is convinced that if it's going to have to go back to the disk for retrieving column data anyway, it might as well scan the table in the first place, since that'll be less work for it to do. It'll use the seek with the CHAR( 7 ) scalar because the statistics for the index know it's not going to find anything, but if data needs to be returned, it has to consider both CPU and I/O weights.

USE tempdb;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'tst'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.tst;
    CREATE TABLE [dbo].[tst] 
    (
        Mon                     [char](6) NULL,
        COL1                    [varchar](50) NULL,
        COL2                    [varchar](50) NULL,
        COL3                    [varchar](50) NULL,
        COL4                    [varchar](50) NULL,
        COL5                    [varchar](50) NULL
    );

    INSERT INTO dbo.tst ( [Mon] )
    SELECT  TOP 100000000
            CONVERT( CHAR( 6 ), DATEADD( DAY, ( ABS( CHECKSUM( NEWID() ) ) % 10000 + 1 ),
                '20000101' ), 112 )         
    FROM    sys.all_objects so
    CROSS APPLY sys.all_objects sp;

    CREATE NONCLUSTERED INDEX IX__tst__Mon
        ON dbo.tst ( Mon )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );
END;

SELECT  Mon, COUNT( 1 )
FROM    dbo.tst
GROUP BY Mon
ORDER BY Mon;

SET STATISTICS IO, TIME ON;

SELECT  Mon, COL1
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  Mon, COL1
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SELECT  *
FROM    dbo.tst WITH ( INDEX = IX__tst__Mon )
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Specifying the hint, in both cases, does reduce the time required for the query to resolve, but the index seek + RID lookup actually results in a significant increase in the number of reads necessary ( my test indicated a 60% increase ). Obviously it's not a 1:1 trade off, since the time difference is about 6x, but regardless, the optimizer is choosing the scan instead.

If you can INCLUDE the columns you need in the index, you'll get the best of both worlds, eliminating the RID lookup and the additional reads.

--DROP INDEX dbo.tst.IX__tst__Mon
CREATE NONCLUSTERED INDEX IX__tst__Mon
    ON dbo.tst ( Mon )
INCLUDE ( COL1, COL2, COL3, COL4, COL5 )
WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 );

SET STATISTICS IO, TIME ON;

SELECT  *
FROM    dbo.tst
WHERE   Mon = '201509'

SET STATISTICS IO, TIME OFF;

Best Answer

Related Solutions

SQL Server – Clustered Index Scan for Self-Referencing FK Cascade Delete

SQL Server Index Tuning – Why Optimizer Chooses Table Scan Over Index Seek

Related Question