Sql-server – Memory threshold is attempted on SQL Server

memoryprocesssql server

Currently we have a monitoring for all the infrastructure (server, DB, network components, etc …). The alert thresholds, usually saturation, are configured as follows:

  • 90% : Warning
  • 95% : Critical

Once the threshold is reached, an alert is generated and sent as an email or via a ticketing solution.
To be more concrete for our case, we have a server with the following config:

  • Server memory : 8 Gb
  • SQL max memory : 7 Gb

Once SQL reaches the max memory, if we also add OS processes, the threshold of 90% is reached, then an alert is generated.
Among the following solutions to avoid the alert, which is the most optimal:

  • Physical memory extension
  • Modification of the alert threshold from 90% to 95%
  • Empty the unused SQL chase (if possible)
  • Decreasing SQL max memory

Otherwise if you have other solutions do not hesitate to share it.

Best Answer

Sadly, sometimes we get less than helpful information to deal with throughout our busy days.

Q. What do Quantified Alerts tell us?

  • That service A is doing or enduring something.
  • That information is happening on a system.

Q. What will it take to Qualify my alerts?

  • Understand the difference between causation , correlation , unrelated events that a Quantified events have with a Qualified event. Merely having an alert about memory is the same as telling a Sports Athlete that he is using all his muscles during a sprint. It means nothing by itself.
  • Causation are events that directly contribute to impact of measurable scope. An example would be shutting down a service has direct effects on the users.
  • Correlation are events that relate to the impact as passive, precursors, or active consequences from the event. An example could be delays on the app server from consistently high Memory and CPU usage combined with Long IOs recorded on the Error logs, which one is the actual issue? The High usage may prevent the new connection, but the limits for current service may be related to the IOS. The passive result may be that some users experience downtime while others still have access (though slow).
  • Unrelated Events are issues that may or may seem like correlation or causation, but in reality are outliers that have no relevance. An example might be a user that has forgot his email but reports the service is down at the same time other users actually have impact to their services.

Conclusion

Think about the trends of events. Does High Memory usage occur during high load or when off hours occur? Are there bad queries being run at the same time or are many, or large queries happening? The first is an issue and the second and third are expected events on a large server.

At the end of the day, having high Memory usage itself is not necessarily a bad thing. Your server is being used. You need to determine measurable negative events in order to decide if the alert is good or bad.