SQL Server 2012 RAM Allocation Issues – How to Safely Allocate More Memory

memorysql server

We're running SQL Server 2012 SP1 CU4 on a 64-bit 12-core, 2 socket server using NUMA architecture server. OS is Windows Server 2008 R2 x64.

Whenever we allocate more than 50% of the physical RAM to SQL server, the machine becomes unstable or unresponsive.

The symptoms are typical of OS memory starvation – i.e. process fail to start, GUI object fail to render, applications misbehave, remote desktop sessions become unresponsive, etc.

We’ve seen this on 2 different machines – one with 192GB RAM, the other with 256GB RAM… as soon as SQL is given more than 50% of the respective physical total, the symptoms appear.

Has anyone else seen this behavior?

— EDIT —

The SQL service runs under an account with LPIM (Lock Pages in Memory) privileges.

McAfee antivirus is, sadly, imposed on the server, though it is at least set up with exclusions for all SQL files.

When the RAM utilization is allowed to exceed 50%, the common behavior we see is:

  • High RESOURCE_SEMAPHORE wait times.
  • Inability to connect to SQL server or run queries.
  • All CPUs maxed out with low disk I/O when running query workloads.

— EDIT 2 —

We've tried re-installing SQL (SP1 CU4), and have checked that nothing else is chomping the RAM. Generally, there's at least 100GB of the total 256GB free at any time.
When we turn off LPIM, we get see the problem "the operating system has swapped at a significant portion of SQL's memory", which is why we have it turned on.

Best Answer

I'd check the perfmon metrics for per-cpu CPU busy, as well as the perfmon metrics instance-wide and per NUMA node for PLE, database pages, etc. Two recent SQL Server KBs

However, for some workloads, especially with higher core count, high concurrent query count, and lots of database disk IO, using trace flags 8015 (disable NUMA support at SQL Server level) and 8048 (remove per-socket query memory allocation botlleneck) will provide even better results than the fixes contained in SQL Server 2012 SP1 CU4. (I confirmed this on our test equipment with simulations of our workload - YMMV)

Trace flag 8015 deserves thorough evaluation before deployment. Memory affinity and attending lower memory latency is sacrificed for managing a single large bpool. It also results in one lazy writer instead of one per NUMA node, and the idea of affinitizing a connection endpoint within a single SQL Server instance to each NUMA node also disappears. But for some workloads the benefits are undeniable. Don't use 8015 without 8048.

I've never seen or heard of any measurable cost to adding trace flag 8048, and if CMEMTHREAD waits and associated spinlocks are triggered by query memory allocation - its the only reliable way to eliminate them.