I am working on a process to remove 90% of a table's data because only 10% is needed for testing.
The best way I have found includes storing 10% of the table's rows into a temporary table.
Current method
SELECT TOP 10 PERCENT *
INTO #temp_some_table
FROM some_table (nolock)
ORDER BY some_column DESC
TRUNCATE TABLE some_table
INSERT INTO some_table
SELECT *
FROM #temp_some_table
DROP TABLE #temp_some_table
This method is filling up tempdb and causing the disk to fill up as well.
Questions
Is there a more efficient way to delete 90% of a table's data
ex (DELETE TOP 90 PERCENT FROM sometable
)
or
Is there a way to insert 10% of some_table's data into a temporary table using batches? Something like this:
DECLARE @r INT;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
INSERT INTO [dbo].[##temp_cds_Basket]
SELECT TOP 10 PERCENT *
FROM [dbo].[cds_basket] s
SET @r = @@ROWCOUNT;
print @r
COMMIT TRANSACTION
END
Possible solution
How about this?
SET NOCOUNT ON;
DECLARE @r INT;
DECLARE @TenPercentDate datetime
with cte (some_column) as (
select top 10 percent some_column from some_table (nolock) order by some_column desc
)
select @TenPercentDate = min(some_column)
from cte
select @TenPercentDate
SET @r = 1;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
DELETE TOP (10000) from
some_table
WHERE some_column < @TenPercentDate
SET @r = @@ROWCOUNT;
print @r
COMMIT TRANSACTION;
--CHECKPOINT; -- if simple
END
--rollback
Best Answer
I have three different ideas that I mentioned in a comment above. Here is a little elaboration on at least one of them (which you are stuck on due to self-diagnosed tunnel vision).
I was thinking about this but we don't really need to do the compare - we can just figure out what datetime value is at the edge of what we want to keep, and delete chunks of older rows until there are none left. For example:
You can see this post for some additional ways to enhance this.
I think this would be a valid option and should cause less pain than the above.
This one is also less painful, but feel free to inject some debugging in there to be sure that you got the data you wanted before you perform the drop.
Note that in both of these latter cases there may be constraints, schema-bound views, etc. that prevent you from just dropping the table (obviously an inbound foreign key is not an issue because you are able to truncate, but outbound may still need to be dealt with). There will also be ramifications to existing plans (well deleting 90% of the data should also cause stats to update and invalidate plans, so really no different), and as @Kenneth points out below, you will need to re-establish permissions on the new table (so you might want to be sure you can script those in advance).