Sql-server – Single statement or batched delete

deletesql serversql-server-2008-r2

I need to delete 6.3 million records from 25 tables in production. I have the following two queries, but I don't know which one will be best:

delete jc 
from Jtable jc
join J1Table j 
on j.JobId = jc.JobId 
join nonJobReProcess n
on n.jobid = j.JobId

or:

while exists (select top 1 * from dbo.Jtable jc (nolock)
join J1Table j (nolock)
on j.JobId = jc.JobId 
join nonJobReProcess  n  (nolock)
on n.jobid = j.JobId)
begin 

delete  top (10000) jc 
from dbo.Jtable jc (nolock)
join J1Table j (nolock)
on j.JobId = jc.JobId 
join nonJobReProcess n  (nolock)
on n.jobid = j.JobId

end

Please explain the reasons I might choose one over the other, or any alternative methods.

Best Answer

Using the second method will result in less locking, transaction log usage and Tempdb resources, because SQL Server doesn't have to perform all of the delete operations in one transaction.

However, instead of the "while exists" statement, you can use a variable that will get the value of @@rowcount after the delete. This can help a little more with performance.

Related Solutions

Oracle 11g – Best Way to Delete Very Large Recordsets

The logic with 'A' and 'B' might be "hidden" behind a virtual column on which you could do the partitioning:

alter session set nls_date_format = 'yyyy-mm-dd';
drop   table tq84_partitioned_table;

create table tq84_partitioned_table (
  status varchar2(1)          not null check (status in ('A', 'B')),
  date_a          date        not null,
  date_b          date        not null,
  date_too_old    date as
                       (  case status
                                 when 'A' then add_months(date_a, -7*12)
                                 when 'B' then            date_b
                                 end
                        ) virtual,
  data            varchar2(100) 
)
partition   by range  (date_too_old) 
( 
  partition p_before_2000_10 values less than (date '2000-10-01'),
  partition p_before_2000_11 values less than (date '2000-11-01'),
  partition p_before_2000_12 values less than (date '2000-12-01'),
  --
  partition p_before_2001_01 values less than (date '2001-01-01'),
  partition p_before_2001_02 values less than (date '2001-02-01'),
  partition p_before_2001_03 values less than (date '2001-03-01'),
  partition p_before_2001_04 values less than (date '2001-04-01'),
  partition p_before_2001_05 values less than (date '2001-05-01'),
  partition p_before_2001_06 values less than (date '2001-06-01'),
  -- and so on and so forth..
  partition p_ values less than (maxvalue)
);

insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('B', date '2008-04-14', date '2000-05-17', 
 'B and 2000-05-17 is older than 10 yrs, must be deleted');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('B', date '1999-09-19', date '2004-02-12', 
 'B and 2004-02-12 is younger than 10 yrs, must be kept');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('A', date '2000-06-16', date '2010-01-01', 
 'A and 2000-06-16 is older than 3 yrs, must be deleted');


insert into tq84_partitioned_table (status, date_a, date_b, data) values 
('A', date '2009-06-09', date '1999-08-28', 
 'A and 2009-06-09 is younger than 3 yrs, must be kept');

select * from tq84_partitioned_table order by date_too_old;

-- drop partitions older than 10 or 3 years, respectively:

alter table tq84_partitioned_table drop partition p_before_2000_10;
alter table tq84_partitioned_table drop partition p_before_2000_11;
alter table tq84_partitioned_table drop partition p2000_12;

select * from tq84_partitioned_table order by date_too_old;

SQL Server Optimization – Most Efficient DELETE Method

The best solution to this problem is nearly always method 2, but with two tables UNION ALL instead of LEFT JOIN together (one to hold the deleted, one for the active). There are several reasons why this is superior:

You can maintain separate statistics on active and deleted rows. Assumption: you will have more deleted than active over time. This means that the "active rows" table stays small and doing fullscans is easier. It also makes things like index rebuilds (if you do them) faster on the active rows.
Statistics of old rows will not affect news rows. This helps you avoid skew problems.
Each table can live on it's own filegroup. This means you can move deleted rows off to cheaper storage.
You can have different indexing/partitioning strategies on the deleted rows vs. the active ones. For example, you may choose to use a column store index on the old rows if they are read via a scan often, but changed very rarely.
The deleted table can be taken offline with a table SWITCH (for maintenance) without disrupting the system too much.

The two table solution can be implemented with a DELETE instead of trigger that moves the rows to deleted instead of deleting them.

The only downside of solution 2 is that you will need to modify your queries to distinguish between deleted and not deleted rows and tables. This can be done via a view, but it is safer to avoid this if possible. The view can confuse the optimiser and there are cases where using a view instead of a table gives you horrible execution plans. If you do use a view, you should add an `IsDeleted' column to both tables and put a check constraint on it. This ensure that the optimiser will not try to seek BOTH table for every query (which would double your IOPS).

For completeness, and to answer your last question, there are some other ways to "solve" this:

Method 4: Add an IsDeleted column and partition on this column. Drawback: Does not allow separate statistics on the two partitions (statistics are table, not partition based, in SQL Server)

Method 5: Use a filtered index to apply different indexing (and secondary storage) on deleted and active rows. Drawback: Good luck getting the optimiser to behave properly with this method.

Best Answer

Related Solutions

Oracle 11g – Best Way to Delete Very Large Recordsets

SQL Server Optimization – Most Efficient DELETE Method

Related Question