Sql-server – SAN for SQL Server: One large spindle set for all loads, or discrete by load

sansql serversql server 2014storage

SQL Server 2014, Standard Edition (running under vmware, fwiw).
Adding a Dell MD3420 DAS box.

In the old school, you would set up discrete spindle sets for MDF, LDF, TEMPDB, BACKUP.

E.g.

MDF: Qty 6: 300GB-15K, RAID 10 // 900GB storage
LDF: Qty 4: 150GB-15K, RAID 10 // smaller data footprint, spindles to increase IOPS
TEMPDB: Qty 4: 150GB-15K, RAID 10 // smaller data footprint, spindles to increase IOPS
BACKUP: Qty 2: 2TB, RAID 1

Use Case:
This system has 5 databases, and is OLTP.

Nowadays, I am being told: Just set up Qty 10: 600GB-15K drives, and put MDF, LDF, TEMPDB and BACKUP all on there. The overall higher performance of the larger spindle set more than makes up for the conention writing to LDF and BACKUP concurrently during backups.

a) Is this new way the right way?
b) LDF is supposed to be discrete relative to MDF… I feel funny!
c) I really feel backups should be their own spindle set (for a bunch of reasons)

The winner is ? SQL Tradition? Or the Dell Storage Architect?

Best Answer

This is a database recovery question first and foremost, then performance.

DB Recovery:

By keeping the LDF and the MDF file in the same physical array, then you lose any chance of performing a 'tail of log' backup. This means your data is gone from either:

The last transaction log taken and copied off the failed array.
A mirrored pair/availability group/replicated database elsewhere

So first question is 'how much data can you lose'. If you cannot lose more than the frequency in which you back your logs then putting it on that array isn't going to meet your data loss objective.

Performance:

The rest really depends on your usage patterns. It seems like you do not have a active machine we can get a baseline from, if I'm wrong let me know. Assuming we don't though, we should understand that DiskIO is critical, especially when there might not be enough RAM or in a high transaction environment. Do you think it will be very highly transactional? If so, give lots of disks to the transaction log drive. That version of the Dell SAN let's you short stroke your disks which should give you a good performance boost as the transaction log drives typically go multi disk not for space, but for write performance.

Putting TempDB on it's own disks was useful but if you don't have any real clear idea as to why you would do it, then really you should test this or just put it in the same array and test that. TempDB can have odd patterns so you might get various performance results and growth. Many people though do just put it on the same disks as the data. It's up to you in this case.

It sometimes might be worth segregating disks by the app patterns and the DBs that support them. I'm not sure if you have that option. If you don't, your layout looks fine but without testing it's impossible to tell. Personally I would test as much as I could first, look at how much growth we are expecting, and then see if it makes more sense to add more IOPS to the main data array, or to keep them separate.

Are you able to do any kind of benchmarking at all?

Related Solutions

SAN Storage & Server for Oracle Replication

If you have already got an EMC then SRDF is the obvious choice, tho' Veritas Volume Replicator may be an option. SAN is not something you usually buy "off the shelf" the first time, the vendor will work with you to establish your requirements, both for performance and resilience. Other vendors you may which to consider are HDS, 3Par and NetApp.

However my recommendation is to not use the storage for this, use Data Guard because a) it's free once you already have paid for an oracle license and b) it enables Active Standby, something that I am very excited about myself.

Sql-server – SQL Disk Setup Advice – TempDB, Log DB, Data file placement question

Unfortunately this is a big ol' "it depends". Complicated further by your having so many applications, probably with wildly different IO profiles, sharing the same instance/server and arrays.

Usually you would want to isolate data and log files as the IO profiles are polar opposite. Typically weighted toward random read for data files and sequential write for logs. The log element is somewhat different in your case as you have multiple log files.

I'd start by getting an understanding of the IO the databases are consuming relative to each other.

    SELECT
        DB_NAME(fs.database_id) AS [Database Name]
      , mf.physical_name
      , io_stall_read_ms
      , num_of_reads
      , CAST(io_stall_read_ms / (1.0 + num_of_reads) AS NUMERIC(10, 1)) AS [avg_read_stall_ms]
      , io_stall_write_ms
      , num_of_writes
      , CAST(io_stall_write_ms / (1.0 + num_of_writes) AS NUMERIC(10, 1)) AS [avg_write_stall_ms]
      , io_stall_read_ms + io_stall_write_ms AS [io_stalls]
      , num_of_reads + num_of_writes AS [total_io]
      , CAST((io_stall_read_ms + io_stall_write_ms) 
        / (1.0 + num_of_reads + num_of_writes) AS NUMERIC(10,1)) AS [avg_io_stall_ms]
    FROM
        sys.dm_io_virtual_file_stats(NULL, NULL) AS fs
    INNER JOIN sys.master_files AS mf
    ON  fs.database_id = mf.database_id
        AND fs.[file_id] = mf.[file_id]
    ORDER BY
            DatabaseName ASC, mf.physical_name
--  num_of_reads DESC
--  num_of_writes DESC  
    OPTION (RECOMPILE) ;

If you can format the output of this into a readable form and add to your question, someone can make a more educated guess at the best use of your 18 disks.

If I was doing this blind, I'd probably start off allocating:

6 RAID10 for the busiest database's data files
6 RAID10 for the other database's data files
4 RAID10 shared Logs
2 RAID1 Tempdb (your single disk configuration is a risky single point of failure)

Assuming that:

the SAN hosts other arrays and therefore hot spares are available from a pool
snapshot isolation isn't in use or isn't placing heavy demands on tempdb

Edit: Following update to question regarding queue lengths:

Queue length counters are oft misunderstood when evaluating SQL Server performance. One of the best analogies I've come across is from a Simon Sabin blog post, Disk Queue Length - a bit like buying Guiness.

In your case, they are so low that you might get by with half the spindles. Would still be interesting to see the other stats from the above query though.

Best Answer

Related Solutions

SAN Storage & Server for Oracle Replication

Sql-server – SQL Disk Setup Advice – TempDB, Log DB, Data file placement question

Related Question