Sql-server – Data warehouse server. How to calculate RAM/CPU specifications

data-warehousesql serversql-server-2016windows-server

I am trying to write a spec for a data warehouse server for our planned data warehouse upgrade.

As we run virtual servers on VMWare hosts we have the ability to add or remove resources as necessary. In the past we've incrementally added RAM and CPU as required. As our demands have increased we've lobbied for more resources. (primarily disk & RAM).

We ask for more. They give us as little as possible.

However recently whenever we talk about resources we are now criticized for not spec'ing the machine right in the first place, and I am now being told the dev hosts are maxed out, there is no more RAM available.

We're a small Local Government organisation with ~50 regular users of the DW. In normal daily use it runs fine. We get good mdx query performance, and our reports and dashboards are fast. Users are happy.

However our ETL processes run throughout the night, and we're starting to see evidence of memory pressure when processing datamarts simultaneously. Last night SSIS failed with warnings about an "out of memory error".

Our existing DW server is Win 2008 R2 with 4 CPU's and 16Gb of RAM running SQL 2012 Std. I have max server memory set to 12GB, leaving 4GB for OS and services etc. Our existing DW has 3 datamarts/OLAP cubes, and we are developing 2 more.

+----------+----------+---------------+-----------+---------------+
| Datamart | Files GB |  Fact (Rows)  | Fact (Mb) | ETL & Process |
| OLAP cube|          |               |           | Time (hours)  |
+----------+----------+---------------+-----------+---------------+
| PBI      |       3  |  190,000      |  180      |  0.2          |
| FBI      |      30  |  26,100,000   |  10,000   |  1.5          |
| RBI      |     175  |  62,000,000   |  32,000   |  8.3          |
| ABI*     |     100  |  44,050,000   |  21,000   |  4.0          |
| EBI*     |      11  |  100,000,000  |  6,000    |  2.0          |
+----------+----------+---------------+-----------+---------------+
* Planned/Estimated

Our new server is planned to be Win 2012 running SQL 2016 Enterprise. It will run SQL, SSIS, SSRS & SSAS. Storage isn't an issue, but i'm not sure about RAM & CPU.

According to the Fast Track Data Warehouse Reference Guide for SQL Server 2012, the minimum I should have is 128Gb for a 2 socket machine… which seems a bit excessive. The Hardware and Software Requirements for Installing SQL Server 2016 recommends a minimum of 4Gb of RAM for SQL 2016.
That's quite a difference!

So.. What is a good starting point? 32Gb? 64Gb? How do I justify my starting position (spec) to IT?

Are there any good guides about how to calculate server resources?

Are there any good rules of thumb?

What are the key ingredient/metrics for RAM sizing in a DW context?

  • The volume of data?
  • The number of cubes?
  • The time it takes to do ETL or process a cube?
  • Peak processing load overnight or performance as viewed by end users during the day?

Best Answer

Great question, and I did a session about this at TechEd a few years ago called Building the Fastest SQL Servers:

https://channel9.msdn.com/Events/TechEd/NorthAmerica/2012/DBI328

In it, I explain that for data warehouses, you need storage that can provide data fast enough for SQL Server to consume it. Microsoft built a great series of white papers called the Fast Track Data Warehouse Reference Architecture that goes into hardware details, but the basic idea is that your storage needs to be able to provide 200-300MB/sec sequential read performance, per CPU core, in order to keep the CPUs busy.

The more of your data that you can cache in memory, the slower storage you can get away with. But you've got less memory than required to cache the fact tables that you're dealing with, so storage speed becomes very important.

Here's your next steps:

  • Watch that video
  • Test your storage with CrystalDiskMark (Here's how)
  • With 4 cores, you'll want at least 800MB/sec of sequential read throughput
  • If you don't have that, consider adding memory until the pain goes away (and caching the entire database in RAM isn't unthinkable)

Say you've got a 200GB database that you're dealing with, and you can't get enough storage throughput to keep your cores busy. It's not unthinkable to need not just 200GB of RAM, but even more - because after all, SSIS and SSAS really want to do their work in memory, so you have to have the engine's data available, plus work space for SSIS and SSAS.

This is also why people try to separate out SSIS and SSAS onto different VMs - they all need memory simultaneously.