Sql-server – Is database partitioning for performance reasons meaningful in a virtualized environment

partitioningperformancequery-performancesql serverstoragevirtualisation

When running SQL server on physical hardware, placing database data and log files on separate disks or RAID arrays can benefit performance, resilience, and maintenance. Partitioning the database over separate physical disks and RAID arrays is also an option. Simply put, more spindles is better.

When SQL server is running in a virtual environment, the concept of spindles applies differently. You often have little to no direct control over what physical disks your virtual disks are mapped to. Does this mean that it doesn't make sense to partition a database, or split the data and log files over separate virtual disks, the same way we do with physical ones?

I understand that there are other reasons to partition databases that still apply in virtual environments. For instance, large read-only tables can be in a separate partition that only needs to be backed up once, but this is not the subject of this post.

Best Answer

This is an old question, but I came across this and was appalled at one particular comment made, suggesting the original asked was "confused". The question is perfectly clear, just not strictly about database administration. It falls into the fields of server virtualization and storage provisioning in virtual environments more than anything. It may well be that this question had been better suited for a different StackExchange site, but arrogantly dismissing the question because one does not understand the issue is unacceptable.

Let me attempt to explain the question better and give my best advice on the subject, in an attempt to leave something useful here for any others that may get dropped here by a search engine.

When deploying a database server, it has been considered best practice to put the OS, data files and log files on separate sets of disks. Just as a random example, let's say we have a server consisting of three RAID arrays:

2 HDDs in RAID 1 for the OS 4 HDDs in RAID 10 for the database data files 4 HDDs in RAID 10 for the database log files This setup would separate the IO and create separate points of failure for the data files and logs, as well as keep it separate from the OS. This was done for performance, resilience and maintenance reasons. Simple enough, right?

But what then, if we are deploying a database server in a virtual environment? There are no physical disks in a virtual environment. No there really aren't, unless you are mapping the virtual machine disk 1:1 to a physical disk, which is not what you'll see done in your typical virtual server farm. Even then, there would be a virtualization layer in between the physical disk and the virtual server, presenting the physical disk as attached to a virtual disk controller rather than the physical controller the disk is actually attached to.

So what is the problem? Let's take a small virtual server deployment as an example. A bunch of rack servers running in a virtualization cluster and one SAN with a bunch of disk groups in RAID 5 and RAID 6 with automatic storage tiering. All virtual disks are stored in the SAN in this example.

How do you go about assigning separate physical disk groups to your virtual database server now? Or maybe you don't? That was the question asked here. No useful answers were given, and understandably so, since it's a complex question where the answer depends a lot on your particular deployment. Some setups might give you very tight control over what kind of and what specific physical disks your virtual disks end up on, whereas others might be more of a black box that handles everything automagically.

In the simple "one-SAN" example I outlined above, I'd say that you don't really need to. But you might still want to. Even if you can't actually control what physical disks the IO ends up at, there are other benefits to splitting the data up. What if you migrate to a new virtual environment in the future, where you do get separate LUNs with different and known performance characteristics? If the data are already split over several virtual disks, moving them to new LUNs with the appropriate IO capability is much easier. In many hypervisors you can also give different IO priority to virtual disks. Again, this gives you some extra control. More accurate control over snapshot behaviour also becomes possible. Each database gets it's own separate underlying file system, that it doesn't need to share with other databases on the same server or even the OS itself.

Basically, this boils down to your performance requirements, size of the database, hypervisor and storage solutions at your disposal and many, many other factors. I hope I've pointed out a couple of useful tips. Plan ahead. While virtual environments with SAN/NAS solutions might seem like black boxes in many ways, many of them do have ways to achieve the same performance and redundancy goals as traditional direct storage.