I have a sizable production database (140GB) on MS SQL Server 2012, which I'm trying to automatically download to our development environments on a periodical basis (every week or on demand). I'm not a DBA, but I have some understanding of the basic SQL commands I could use. However, I don't really know the best, most efficient way to implement this and I'd love some feedback and advice.
I came up with the following plan:
- backup production
- restore a copy of production to a copy
- strip down (old data) from copy
- reduce space usage of the copy
- backup copy so it can be downloaded
There are a couple of things I noted:
- I really cannot risk the clean down script running against the production database.
- The production database is in use 24/7/365
- I need to delete a lot of data (out of 10 years worth of data, keep the last 6 months)
- I'm cleaning out millions of rows in multiple tables.
- To prevent transaction log growth getting out of hand, I should use an explicit transaction to avoid log flushes
- Similarly, since I don't require recovery of the database copy, I can switch to simple recovery mode.
My (single) script looks like this:
/*
This script is designed to delete all orders older than 6 months old.
It uses a view, which limits the number of claims it brings back
every time it runs. Running the script in this way should reduce the load
on the transaction log, which otherwise would get out of hand. We also
run this in simple recovery mode for a similar reason. For more information
on how this script is structured, please see the following resources:
- http://web.archive.org/web/20100212155407/http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx
- http://stackoverflow.com/a/18292136/119624
- http://dba.stackexchange.com/a/63773/45757
- http://www.sqlservercentral.com/blogs/sqldbauk/2011/07/11/can-you-delete-from-a-view_3F00_/
- http://www.sqlbadpractices.com/should-you-shrink-your-database-in-your-maintenance-plan/
- http://stackoverflow.com/questions/2043726/best-way-to-copy-a-database-sql-server-2008
*/
USE [master];
GO
-- backup the existing PRODUCTION database
BACKUP DATABASE PROD
TO DISK = 'E:\temp\PROD.bak' -- some writeable folder. I put date and notes in the file name too
WITH COPY_ONLY
-- we need to delete the existing PRODCOPY
IF DB_ID('PRODCOPY') IS NOT NULL
BEGIN
ALTER DATABASE PRODCOPY SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
DROP DATABASE PRODCOPY;
END
GO
-- restore that copy
RESTORE FILELISTONLY
FROM PROD
-- Restore the files for PROD.
RESTORE DATABASE PRODCOPY
FROM DISK = 'E:\temp\PROD.bak'
WITH NORECOVERY,
MOVE 'PROD_Data' TO 'D:\PROD_Data.mdf',
MOVE 'PROD_Log' TO 'D:\PROD_Log.ldf'
GO
-- let's go into simple recovery mode
-- http://stackoverflow.com/a/18292136/119624
ALTER DATABASE PRODCOPY SET RECOVERY SIMPLE
GO
-- Make sure we are now using that Db
USE PRODCOPY
GO
-- let's create a view to speed up the execution plan on the deletes
-- drop and create
IF Object_ID('vw_DeleteOrder') IS NOT NULL
DROP VIEW vw_DeleteOrder
GO
/*
This is going to constrain our view to all orders
older than 3 months old, and it is a rolling view,
which updates as we subsequently start deleting
orders from the table.
Recommended by the MySpace SQL team:
http://web.archive.org/web/20100212155407/http://blogs.msdn.com/sqlcat/archive/2009/05/21/fast-ordered-delete.aspx
*/
CREATE VIEW vw_DeleteOrder
AS
SELECT TOP (1000) * FROM [Order]
WHERE DateCreated < DATEADD(month, -6, GETDATE())
ORDER BY OrderId DESC
GO
IF (DB_NAME() LIKE 'PRODCOPY')
BEGIN
-- We just want to remove everything from this table
IF (EXISTS (SELECT NULL FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME = 'SystemLog'))
BEGIN
TRUNCATE TABLE [SystemLog]
END
/*
We need these temp tables because we need to delete
tables related to these tables, but they don't have
an orderId column. Therefore we cannot delete them
directly, or we need to delete them after the order
is deleted.
*/
DECLARE @AddressIds TABLE(Id INT)
DECLARE @CustomerIds TABLE(Id INT)
---- Caution : perform actual deletes now !!
--- start an Explicit transaction to avoid Log flushes.
SET NOCOUNT ON;
DECLARE @r INT;
SET @r = 1;
WHILE @r > 0
BEGIN
BEGIN TRANSACTION;
-- Data for linked deletes
INSERT INTO @AddressIds SELECT AddressId FROM Customer WHERE CustomerId IN (SELECT CustomerId FROM [Order] WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrder))
INSERT INTO @CustomerIds SELECT CustomerId FROM [Order] WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrder)
-- Delete linked data
DELETE FROM OrderLine WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrder)
DELETE FROM OrderStatistic WHERE OrderId IN (SELECT OrderId FROM vw_DeleteOrders)
-- Random links to orders
UPDATE OrderProcessing SET AssignedOrderId = NULL WHERE AssignedOrderId IN (SELECT OrderId FROM vw_DeleteOrder)
-- Delete customers from defined list (need to check multiple use addresses)
DELETE FROM Customer WHERE CustomerId IN (SELECT DISTINCT CustomerId FROM @CustomerIds)
-- Delete addresses from defined list (need to check multiple use customers)
DELETE FROM Address WHERE AddressId IN (SELECT DISTINCT AddressId FROM @AddressIds)
-- Delete records from view (you can do this when a view targets a single table)
-- See http://www.sqlservercentral.com/blogs/sqldbauk/2011/07/11/can-you-delete-from-a-view_3F00_/
DELETE FROM vw_DeleteOrder;
SET @r = @@ROWCOUNT;
IF(@@ERROR <> 0)
ROLLBACK TRANSACTION;
ELSE
COMMIT TRANSACTION;
-- CHECKPOINT; -- if simple
CHECKPOINT;
-- BACKUP LOG ... -- if full
END
-- Rebuild affected indexes
DBCC DBREINDEX ('Order')
DBCC DBREINDEX ('OrderLine')
DBCC DBREINDEX ('OrderStatistic')
DBCC DBREINDEX ('Customer')
DBCC DBREINDEX ('Address')
-- http://www.sqlbadpractices.com/should-you-shrink-your-database-in-your-maintenance-plan/
-- Shrink database files
-- NOTE: You must be within the context of the appropriate database to execute this
DBCC SHRINKFILE(1,NOTRUNCATE)
DBCC SHRINKFILE(1,TRUNCATEONLY) -- shrink mdf file
DBCC SHRINKFILE(2,TRUNCATEONLY) -- chrink ldf file
END -- Must be PRODCOPY database
I was wondering about splitting this whole process into two separate SQL jobs. One to do the backup and restore (running as an sa) and then creating a separate user that only has access to the PRODCOPY database, that then runs the clean up section of this script. Any thoughts on this?
I assume this is a common task that many DBAs have to do frequently. If anyone has any thoughts, ideas or improvements then I'd love to hear them. This script has been put together through a number of posts I found on the internet. I referenced then where appropriate.
Best Answer
This is a good idea. You can have 2 jobs
Some things to consider in your script :
If the users are connected to the
PRODCOPY
, make sure to disconnect them usingBacking up PROD database, if you are on sql server 2008R2 and up, then standard edition supports BACKUP
compression
. You should use that as well.Make sure you have Instant File Initialization to cut down the restore time.
do not drop the database, instead restore
with replace
. With Replace will NOT require pages with identical data to be rewritten there by cutting down the restore time.Your subsequent script will fail as you are restoring the database, but still putting it in NORECOVERY. You should change it to
RECOVERY
.A good coding practice, avoid using INFORMATION_SCHEMA. Instead use
sys.tables
. Also, always use schema.tableName instead of just tableName.The references are to Aaron Bertrand's blog that explains in more detail
Change the sequence of shrink and rebuild indexes. The Shrink should be first followed by a rebuild or reorg and update stats.
Use the new syntax of
alter index rebuild | reorganize
. To automate the above point, you can use Ola Hallengren's - SQL Server Index and Statistics Maintenance