Using SQL Server 2012 Standard – I'm running a delete on a table based on the contents of another table. It's taking rather a long time (5 hours) and doesn't seem to be optimal to me, would appreciate some input optimising the statement:
delete from [dbo].[tbl1]
where exists (
select *
from [dbo].[tbl2] t
where [dbo].[tbl1].[col1] = t.[col1]
and [dbo].[tbl1].[col2] = t.[col2]
and [dbo].[tbl1].[col3] = t.[col3]
)
The columns are as follows:
tbl1.col1 varchar(10)
tbl1.col2 datetime
tbl1.col3 varchar(60)
tbl2.col1 varchar(10)
tbl2.col2 datetime
tbl2.col3 varchar(30)
I realise that the datatype on col3
differs, I know this is bad, but would this mean the index cannot be used?
There is a non-unique clustered index on each table (not covered by this query) and a non-clustered index on both, covering all three columns included in the where clause.
tbl1
contains ~1.2 billion rows, tbl2
contains ~30 million rows. I'm expecting around 30 million rows to be deleted from tbl1
.
Any help appreciated!
EDIT: FYI, tbl1
and tbl2
are on differing filegroups, but on the same disk (SAN). Also, here is the execution plan:
Best Answer
I've run into this kind of performance issue before, and breaking the delete up into discrete steps of a fixed size is what ended up doing the trick. It allows SQL Server to commit changes more frequently, which is generally easier on the transaction log.
The optimal batch size just depends on your server and what it can handle. I've found 100,000 to be a safe size for me, and we've not got anything particularly powerful, so you may be able to get away with more.