SQL Server – How to Copy Database and Clear Data Older Than N Days

copydeletesql server

I have a sizable production database (140GB) on MS SQL Server 2012, which I'm trying to automatically download to our development environments on a periodical basis (every week or on demand). I'm not a DBA, but I have some understanding of the basic SQL commands I could use. However, I don't really know the best, most efficient way to implement this and I'd love some feedback and advice.

I came up with the following plan:

  1. backup production
  2. restore a copy of production to a copy
  3. strip down (old data) from copy
  4. reduce space usage of the copy
  5. backup copy so it can be downloaded

There are a couple of things I noted:

  1. I really cannot risk the clean down script running against the production database.
  2. The production database is in use 24/7/365
  3. I need to delete a lot of data (out of 10 years worth of data, keep the last 6 months)
  4. I'm cleaning out millions of rows in multiple tables.
  5. To prevent transaction log growth getting out of hand, I should use an explicit transaction to avoid log flushes
  6. Similarly, since I don't require recovery of the database copy, I can switch to simple recovery mode.

My (single) script looks like this:

/*
This script is designed to delete all orders older than 6 months old.
It uses a view, which limits the number of claims it brings back
every time it runs. Running the script in this way should reduce the load
on the transaction log, which otherwise would get out of hand. We also
run this in simple recovery mode for a similar reason. For more information
on how this script is structured, please see the following resources:

 - http://web.archive.org/web/20100212155407/http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx
 - http://stackoverflow.com/a/18292136/119624
 - http://dba.stackexchange.com/a/63773/45757
 - http://www.sqlservercentral.com/blogs/sqldbauk/2011/07/11/can-you-delete-from-a-view_3F00_/
 - http://www.sqlbadpractices.com/should-you-shrink-your-database-in-your-maintenance-plan/
 - http://stackoverflow.com/questions/2043726/best-way-to-copy-a-database-sql-server-2008
*/
USE [master];
GO

-- backup the existing PRODUCTION database
BACKUP DATABASE PROD
TO DISK = 'E:\temp\PROD.bak' -- some writeable folder. I put date and notes in the file name too
WITH COPY_ONLY

-- we need to delete the existing PRODCOPY
IF DB_ID('PRODCOPY') IS NOT NULL
BEGIN
  ALTER DATABASE PRODCOPY SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
  DROP DATABASE PRODCOPY;
END
GO

-- restore that copy
RESTORE FILELISTONLY
   FROM PROD
-- Restore the files for PROD.
RESTORE DATABASE PRODCOPY
   FROM DISK = 'E:\temp\PROD.bak'
   WITH NORECOVERY,
   MOVE 'PROD_Data' TO 'D:\PROD_Data.mdf', 
   MOVE 'PROD_Log' TO 'D:\PROD_Log.ldf'
GO

-- let's go into simple recovery mode
-- http://stackoverflow.com/a/18292136/119624
ALTER DATABASE PRODCOPY SET RECOVERY SIMPLE
GO

-- Make sure we are now using that Db
USE PRODCOPY
GO

-- let's create a view to speed up the execution plan on the deletes
-- drop and create
IF Object_ID('vw_DeleteOrder') IS NOT NULL
    DROP VIEW vw_DeleteOrder
GO

/* 
This is going to constrain our view to all orders 
older than 3 months old, and it is a rolling view, 
which updates as we subsequently start deleting 
orders from the table.
Recommended by the MySpace SQL team: 
http://web.archive.org/web/20100212155407/http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx
*/
CREATE VIEW vw_DeleteOrder
AS 
    SELECT TOP (1000) * FROM [Order]
    WHERE DateCreated < DATEADD(month, -6, GETDATE())
    ORDER BY OrderId DESC 
GO

IF (DB_NAME() LIKE 'PRODCOPY')
BEGIN

    -- We just want to remove everything from this table
    IF (EXISTS (SELECT NULL FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'SystemLog'))
    BEGIN
        TRUNCATE TABLE [SystemLog]
    END

    /* 
    We need these temp tables because we need to delete
    tables related to these tables, but they don't have
    an orderId column. Therefore we cannot delete them 
    directly, or we need to delete them after the order
    is deleted.
    */
    DECLARE @AddressIds TABLE(Id INT)
    DECLARE @CustomerIds TABLE(Id INT)

    ----  Caution : perform actual deletes now !!
    --- start an Explicit transaction to avoid Log flushes.

    SET NOCOUNT ON;
    DECLARE @r INT;
    SET @r = 1;

    WHILE @r > 0
    BEGIN
        BEGIN TRANSACTION;

        -- Data for linked deletes
        INSERT INTO @AddressIds SELECT AddressId FROM Customer WHERE CustomerId IN (SELECT CustomerId FROM [Order] WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrder))
        INSERT INTO @CustomerIds SELECT CustomerId FROM [Order] WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrder)

        -- Delete linked data
        DELETE FROM OrderLine WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrder)
        DELETE FROM OrderStatistic WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrders)

        -- Random links to orders
        UPDATE OrderProcessing SET AssignedOrderId = NULL WHERE AssignedOrderId IN (SELECT OrderId FROM vw_DeleteOrder)

        -- Delete customers from defined list (need to check multiple use addresses)
        DELETE FROM Customer WHERE CustomerId IN (SELECT DISTINCT CustomerId FROM @CustomerIds)

        -- Delete addresses from defined list (need to check multiple use customers)
        DELETE FROM Address WHERE AddressId IN (SELECT DISTINCT AddressId FROM @AddressIds)

        -- Delete records from view (you can do this when a view targets a single table)
        -- See http://www.sqlservercentral.com/blogs/sqldbauk/2011/07/11/can-you-delete-from-a-view_3F00_/
        DELETE FROM vw_DeleteOrder;

        SET @r = @@ROWCOUNT;

        IF(@@ERROR <> 0)
            ROLLBACK TRANSACTION;
        ELSE
            COMMIT TRANSACTION;

        -- CHECKPOINT;    -- if simple
        CHECKPOINT;
        -- BACKUP LOG ... -- if full
    END

    -- Rebuild affected indexes
    DBCC DBREINDEX ('Order')
    DBCC DBREINDEX ('OrderLine')
    DBCC DBREINDEX ('OrderStatistic')
    DBCC DBREINDEX ('Customer')
    DBCC DBREINDEX ('Address')

    -- http://www.sqlbadpractices.com/should-you-shrink-your-database-in-your-maintenance-plan/
    -- Shrink database files
    -- NOTE: You must be within the context of the appropriate database to execute this
    DBCC SHRINKFILE(1,NOTRUNCATE)
    DBCC SHRINKFILE(1,TRUNCATEONLY) -- shrink mdf file
    DBCC SHRINKFILE(2,TRUNCATEONLY) -- chrink ldf file

END -- Must be PRODCOPY database

I was wondering about splitting this whole process into two separate SQL jobs. One to do the backup and restore (running as an sa) and then creating a separate user that only has access to the PRODCOPY database, that then runs the clean up section of this script. Any thoughts on this?

I assume this is a common task that many DBAs have to do frequently. If anyone has any thoughts, ideas or improvements then I'd love to hear them. This script has been put together through a number of posts I found on the internet. I referenced then where appropriate.

Best Answer

One to do the backup and restore (running as an sa) and then creating a separate user that only has access to the PRODCOPY database, that then runs the clean up section of this script. Any thoughts on this?

This is a good idea. You can have 2 jobs

  1. Perform backup and restore of PROD database to dev environments
  2. Once above is completed, kick off the scrubbing script to purge old data.

Some things to consider in your script :

  • If the users are connected to the PRODCOPY, make sure to disconnect them using

    alter database db_name
    set single_user with rollback immediate
    waitfor delay '00:00:05'
    alter database db_name
    set multi_user   
    
  • Backing up PROD database, if you are on sql server 2008R2 and up, then standard edition supports BACKUP compression. You should use that as well.

    -- we need to delete the existing PRODCOPY
    
  • Make sure you have Instant File Initialization to cut down the restore time.

  • do not drop the database, instead restore with replace. With Replace will NOT require pages with identical data to be rewritten there by cutting down the restore time.

  • Your subsequent script will fail as you are restoring the database, but still putting it in NORECOVERY. You should change it to RECOVERY.

    -- We just want to remove everything from this table
    
  • A good coding practice, avoid using INFORMATION_SCHEMA. Instead use sys.tables. Also, always use schema.tableName instead of just tableName.

    The references are to Aaron Bertrand's blog that explains in more detail

  • Change the sequence of shrink and rebuild indexes. The Shrink should be first followed by a rebuild or reorg and update stats.

  • Use the new syntax of alter index rebuild | reorganize. To automate the above point, you can use Ola Hallengren's - SQL Server Index and Statistics Maintenance