Sql-server – SQL Server Log Shipping: Where is the hottest IO during RESTORE LOG: To db or to log

sql server

Using SQL Server 2008.

I have used SQL log shipping for years. Works great.

Working with a new standby server (recipient of log shipping), I am seeing very poor log restore performance, and wonder: Of the MDF and LDF, is one hit particularly harder than the other during log restore?

Background:
With dedicated hardware, I have no problem with this area. (To be fair, I have always been unhappy with the ldf size on the target… but what are ya going to do with a db in restore state?)

I am now experimenting with a cloud server and block storage. So far…. I am rather underwhelmed with the IOPS of block storage. The log restore operation is really slow (our main db server puts out a log file every 4 mins, and it takes 2+ mins to restore each one on the target!).

So I am debating whether:

  • get a faster block storage device (mucho $$)
    — But this only makes $$ sense for either the LDF or MDF… If I have to do both, it is just too expensive (ergo my question)
  • figure out another solve for SQL Server in the cloud
  • Forget the cloud for SQL Server, 'cause the iops just aren't there…

Further digging around uncovers: https://serverfault.com/questions/261631/temporary-io-boost-for-an-mssql-server

Which indicates that the hot IO is going to be to the LDF… and that would be the spot to optimize… (though "restore log" is not explicitly the agenda of this thread).

Are we agreed that boosing the IOPS of the LDF disk should boost RESTORE LOG performance?

Thank you!

Best Answer

The data is in:

Improving speed of the LDF storage makes a dramatic improvement in log restore time.

Thank you all for the input!

What I did:

I am using Rackspace Cloud servers. Not their "SQL Server" instances, just plain jane Win 2008 images, that I load SQL 2008 onto. For storage I use Rackspace block storage. I also evaluated Amazon EC2 with EBS (the Amazon block storage).

The core questions I was attacking:

1) Is it viable to run SQL Server on a cloud server for a 300+ GB database?

2) Is there benefit for the extra spend to get the Rackspace SSD based block storage? (It is rather pricey).

The methodology I used:

I did two kinds of testing:

i) SQL IO test tools, to measure disk performance of the block storage, and to compare to dedicated hardware (both "on the metal" and in virtual machines)

ii) An actual restore of a big database, and automated log shipping from the production database.

Answers:

a) Rackspace regular block storage is much faster than Amazon AWS (EBS). Much much.

b) Based on my measurements and experience, I would not put a production SQL Server DB onto Amazon EBS. There are just too many "pauses" and other bad performance issues. Note that SQL Server will work on EC2 w EBS -- many folks do it -- just that based on my measurements there will be IO delays (a lot of them). This will particularly impact RESTORE LOG operations, and all write-intensive operations (reads can be cached... but LDF writes are not cached...).

c) Rackspace cloud with regular block storage can absolutely be used for SQL Server. It is not mind blowing fast, but it is great for reads, and OK for writes.

d) Rackspace SSD block storage makes a dramatic difference for log restore operations, and (I extrapolate here), any write-intensive use of SQL server will get a big boost.

So the bad news is: When you are pushing around a 500gb database, you can't get the performance of a $150K SAN for $300/month.

The good news is: You can get really decent performance, that matches or beats RAID5 from a virtual machine on dedicated hardware, for about $480/month

The Data

Here is the data of the log restore operation, showing first the duration of log restore operations on the slower system, then on the faster system, and then the difference. This log restore is happening every 10 mins on both systems.

Data -- Here is the data of the log restore operation, showing first the duration of log restore operations on the slower system, then on the faster system, and then the difference. This log restore is happening every 10 mins on both systems.