Sql-server – How to calculate required ressources from a baseline and what should be measured at all

monitoringsql serversql server 2014sql-server-2008

I have to deal with the following situatiuon:

Currently we have

  • 1 cluster with 5 nodes running 10 instances (SQL Server 2008 Enterprise edition), and
  • 1 cluster with 2 nodes running 5 instances (SQL Server 2014 Standard edition)

in use. All servers are virtual servers running in VMWare.

We want to setup a new cluster (SQL Server 2014 Enterprise). In a first step, 1 instance from the 2008 Enterprise cluster and 1 instance from the 2104 Standard cluster are to be migrated.

Therefore, my boss asked 2 questions:

  1. How many cores do we need (aim: minimize license costs)?
  2. How much RAM do we need?

My answer was: "It depends …" Now I have to deliver hard facts by monitoring the next few weeks. Great! (beware of irony)

My approach for question number 1:

Using perfmon.exe I plan to monitor

  • Processor\% Processor Time (_total and single cores),
  • Processor\% User Time (_total and single cores),
  • Processor\% Interrupt Time (_total and single cores) – is this really necessary? -, and
  • System\Processor Queue Length.

The question is, where to get these data from? From the node? From the SQL Server?

In the first case it should be easy: the first instance in question – vsql2008ent-1\instanceX for the sake of simplicity – is currently running on a node, let's name it node sql2008NodeA. No other instances, nor server should run under normal conditions on this node. So it should not matter where I get the data from, should it? In case of a disaster other instances will be running on this node, too. But we want to have a baseline for normal operation.

The second instance – vsql2014stan-1\instanceY – shares it's node – sql2014NodeA – with 2 other instances. In this case I can never be sure, how much cores the instance will truely need for smooth operation, right? So I can monitor the instance. But what does the result mean? It shows the CPU ressources actually used by this instance, only. But would more cores have been used if they were available? So what would be the answer to the question mentioned above?

RAM is the other question. Due to several disasters in the past when all instances landed on the same node I have set an upper limit for the maximum server memory for each instance. This limit depends on the available memory of the node (currently 100GB or 120GB respectively). So how to monitor this? If all memory is used up, everything seems clear: insufficient memory. If all goes slow: insufficient memory. But how much memory do I really need?

I try to summarize my questions:

  1. Where should I get the measures from (node vs. server)?
  2. Do I need to monitor the interrupt time, if I want to know the number of cores required?
  3. What should I monitor under the given circumstances to answer the question, how much RAM I need (I know: "The more the better.")?

Thank you very much for your help!

Best regards!

Best Answer

The questions of whether/how to directly measure CPU core usage etc. are beyond my understanding, but here's what I'd consider trying:

Run a standard profiler trace with database name added, during your normally busiest period. Total up the CPU column for the SQL:BatchCompleted and RPC:Completed events by database, and you'll get a rough idea of how much CPU resources (which may be spread across multiple cores) each database is consuming. (Perhaps also total up the CPU column for the other events to see if anything major was missed. And save the trace "as trace table" for analysis.)
Exactly how to translate that to how many cores you'll need, I can't say. But if you also measure the total system CPU usage during the profiler run you might be able to estimate against the specific database's ratio of the total.

Note: If your server takes less than a few hundred batch requests a second (see SSMS activity monitor), then a standard profiler trace even across the network will almost certainly not affect performance. And if you instead script a server-side trace then more requests a second can be handled without slowing anything, but I make no promises for your environment.

For RAM, I wonder if http://www.sqlshack.com/sql-server-memory-performance-metrics-part-4-buffer-cache-hit-ratio-page-life-expectancy/ might help you determine if your instances need less/more. I don't think there's any way to do this by database though.