Sql-server – Best Way To Migrate Existing Production Database To New Physical Storage Structure

physical-designsql serverstorage

I have recently taken a job as a Senior Database Administrator at a small but rapidly growing company. My first order of business was a clean up the logical design of the main database (adding primary and foreign keys to existing tables, removing database objects that were no longer in use, eliminating redundant indexes, adding missing indexes, etc) but now that the preliminary logical cleanup is complete I have turned myself to the physical design.

The current production setup is that we have a physical database server, 4 sockets, 16 cores, hyper-threaded to have 32 logical processors with 256 GB of memory. CPU utilization, even during the busiest times, is typically under 15%.

For storage we have four drives, C, D, E and T, with C for the OS, D for the data files, E for logs and T for TempDB. The size of the primary data file for the main database is currently 650 GB, but is growing since the company is growing. There are 700 GB free on the D: drive, which is an SSD in Raid 10, with 1.45 TB of capacity in total.

We are using SQL Server 2014, but the main database is in 2008 compatibility mode, due to a performance issue with one of the stored procedures which occurs in 2012/2014 compatibility mode (In 2014 compatibility the performance problems are consistent, in 2012 compatibility the performance issue only presents itself under load)

The intermittent change that I would like to make, is to take the one 650 GB data file for the main database and split it into 2 file groups (one for the main data, the other for non-clustered indexes) with 8 files per file group. So my questions are this:

1) Should I expect any gains in performance or reliability breaking up the main data file in the manner I described?

2) How do I make these changes on a production database server where I can schedule (at most) a 30 minute downtime window per day?

Also note that while I have access to a number of servers I could use to test these changes, prior to making them in production, the non-production hardware is under-powered compared to actual production. For example, the box I am planning to use for my tests hosts a production database copy which is refreshed daily. It is a virtual machine with a 8 cores, 35.6 GB of memory and the D: Drive is a virtual disk hosted on a SAN, which has 1.19 TB of total space. So while I can test my changes on this machine, I cannot be confident that any performance increases on this machine will be representative of actual production.

So far all attempts to move towards the state I have described have involved the session_id for my session getting stuck during a DbccFilesCompact command. When this happens, the percent_complete does not increment, and the SPID is suspended, and dm_exec_requests shows it cycling through PAGEIOLATCH_SH waits on various resources. I'm not sure how long it would need to complete (if it could complete), but I have let in run for hours at a time with no progress. I can kill the process and that kill resolves immediately, but when the process is killed no progress is made towards the desired end state.

Has anyone faced this problem in their own companies?
If so, what was the resolution?

Best Answer

I am not a SQL Server DBA, I use MySQL for my jobs. But some truths are universal. :-)

1) Should I expect any gains in performance or reliability breaking up the main data file in the manner I described?

I have observed that the benefit of any scalability optimization depends on the queries you run.

It's practically the definition of optimization that any method of optimization will improve one type of query workload as it worsens some other types of workloads.

2) How do I make these changes on a production database server where I can schedule (at most) a 30 minute downtime window per day?

I would set up a replica server (I gather in SQL Server land these are called Subscribers). Restructure the physical tablespaces of the replica. Then it doesn't matter how long it takes, as long as it can catch up to the master after the restructure is done.

I'm assuming SQL Server replication is asynchronous and also agnostic about physical layout, as MySQL replication is. But I don't know that SQL Server supports this.

Once you're done restructuring the replica, and it catches up to be in sync with its master, you can "cut over" probably by changing DNS entries for the respective server instances. Then the downtime is minimal, and brief, regardless of the size of the databases. The downtime should include a few moments to make sure all changes from the master have "drained" and been applied fully to the replica.

The tradeoff being that you need an extra set of DB server(s), at least temporarily. This is a case where using virtual servers or cloud servers can really save you!