I am investigating two SQL Server 2014 instances, both running identical software, both running identical hardware (VMs), both loaded with the same test database, but one with hugely greater performance on a microbenchmark.
The benchmark essentially runs the following in a loop:
BEGIN TRANSACTION
INSERT INTO [tablename] ([data])
VALUES (CONVERT(NVARCHAR(2000), NEWID()))
COMMIT
One server can achieve approximately 200 transactions per second. The other, approximately 2000 transactions per second.
Since this transaction is doing so little, I hypothesized that the bottleneck could be disk I/O in flushing the log buffer. I thought the DELAYED_DURABILITY option was sure to explain the difference. But alas, it was set to Disabled on both servers. An audit of other options has turned up no other apparent configuration differences.
So my questions are:
- Is there any other way that transaction durability could be delayed when that option is set to Disabled, e.g. a master override config file somewhere?
- Is there any other way that log buffer flushing could be influenced?
- Are there any other settings I should look into that might explain the wide variance in performance that I am seeing?
N.B. I'm aware that this is a microbenchmark and is not representative of the application workflow – however, real-world testing has shown a performance difference too, and this transaction loop seems to be the minimum reproducible example that shows a difference.
Additional details
Looking at the wait stats, WRITELOG is top of the list.
Best Answer
If the performance of this minimal test is that drastically different, and you have thoroughly confirmed that the software (and instance / database level settings - for example the checkpoint-related settings like indirect checkpoint, recovery interval, target recovery time, etc might affect a benchmark like this one) is the same, then the statement about the hardware being the same is probably not exactly right.
You mentioned
WRITELOG
is your highest wait. With this workload,WRITELOG
will almost inevitably be the highest wait, since that's all you're doing. Is the difference inWRITELOG
waits between the two servers large enough to account for the 10x difference in transactions per second?If that wait is significantly higher on the slow server, then it's possible that, despite the same hardware specs being specified at the VM level, the "path" to the I/O system is slower on the second VM. Or the underlying storage is slower.
sys.dm_io_virtual_file_stats
DMV to see if the slow server is experiencing slower writes or more frequent / longer stalls than the fast server.If you're seeing slowness reported in those places, you'll need to work with the VM administrator - to see if there is a difference in the underlying storage, or the access to the storage (if it's going out to a SAN).