I am performing a delete operation on very large sql server table based on query as discussed below.
delete db.st_table_1
where service_date between(select min(service_date) from stg_table)
and (select max(service_date) from stg_table);
stg_table and stg_table_1 doesn't have indexes on service_date.
both of these tables are loaded with million rows of data and delete operation is taking lot of time. Requesting your suggestion to improve the performance of this query.
I referred to strategy described in below question but couldn't understand how to implement it.
How to delete large amount of data in sql server without data loss?
requesting your kind suggestion on this.
Update:
select * into db.temp_stg_table_1
from db.stg_table_1
where service_date not between( select min(service_date) from db.stg_table)
and (select max(service_date) from db.stg_table);
exec sp_rename 'stg_table_1' , 'stg_table_1_old'
exec sp_rename 'temp_stg_table_1' , 'test_table_1'
drop table stg_table_1_old
how about if go with above logic to delete the millions of records. any advantages and disadvantages with that.
Best Answer
Testing based on your comments
Tested on SQL Server 2014 SP3
DDL
PK's + Clustered indexes on identity fields.
DML
2.5M rows in
dbo.st_table_1
and 5M rows indbo.stg_table
(Almost) all of these 2.5M rows will be deleted by the query Which is a more than 10 times less than yours.Running your query
The actual execution plan for your base delete statement
As expected
dbo.stg_table
is accessed twice to get the max & min values with a stream aggregate. The cpu time & elapsed / execution time:A missing index hint is added to the execution plan:
However, when we add the index, an extra sort appears to delete the rows from this newly added index:
The plan
And the cpu time / elapsed time increases:
YMMV, but from my example, based on your comments about data, it did not improve the query.
Creating an index on
[dbo].[stg_table]
As a result the
MAX()
andMIN()
can leverage the newly created index to return only one row instead of a full clustered index scan:With the execution time improved:
And the execution plan
But this is only based on indexing & my own example. Proceed at your own risk.
Extra Notes
You should look into splitting that delete into separate batches so it won't fill up the log file & not having one big block of failed / succeeded delete .
You could also consider using
(TABLOCK)
so the entire table is locked from the very beginning.Update:
SELECT INTO
+sp_rename
Apart from performance,
sp_rename
needs aSch-M
lock to complete, meaning that it has to wait for all other sessions to release their locks on the table before it can be modified. Any indexes/constraints on the original table will be gone and you will have to recreate them.When I run the query on my own data:
This does not represent your data, keep that in mind.
It is reading all rows to return 0 which is not optimal.
With a high execution time:
But this is not really meaningful without more information regarding your data. A query plan would be needed to give out more correct advice.