Sql-server – the best way to remove large amount of data from large tables

sql serversql-server-2008sql-server-2008-r2

I have two large SQL 2008 tables:

  • table_header with 7 columns and ~ 2 million records
  • table_detail with 22 columns and ~ 78 million records

The common field between the two is called item_name which is varchar(32).

I need to remove all records from table_detail where item_name is not in table_header.

What is the best way to accomplish this task?

This is a busy production server. There is no maintenance scheduled anytime soon, but this data needs to be gone now. I can't just run something like:

delete from table_detail
where item_name not in (
select item_name from table_header);

Best Answer

I've faced such problems a few times so I have 2 tips for you.

1 Use this script that breaks your delete in packages so you will not run the risk of crashing the log

declare @var int, @ var2 int
set @var = (select count (1) from <Table_to_Delete> WHERE <Your-delete-clause>)
set @ var2 = 0

while @ var2 <> @var
begin
set rowcount 50000 - batchsize

print getdate ()
delete from <Table_to_Delete> WHERE <Your-delete-clause>

set @ var2 = @ var2 + @@ ROWCOUNT
print getdate ()

checkpoint - it's optional to avoid log grow

end

2) Now if the amount of rows that should remain in the table is really small, example if you are to delete 80 ~ 90% of the table I recommend you copy the table structure and copy the data to the new table. This option is valid if the table structure is simple because you will have to remember to copy all indices, foreing keys, and explicit permissions.