We've implemented some feature using service broker and deployed recently on production. I am observing that "Stored Procedures Invoked/sec" counter is always increasing. However, I can see that other counters like Tasks Running etc are stable. Why Stored Procedures Invoked/sec not stick to a range?
SQL Server – Increasing ‘Stored Procedures Invoked/sec’ Counter
monitoringperformanceservice-brokersql serversql-server-2008-r2
Related Solutions
Thank you for the detailed explanation of your problem (one of the best laid out questions actually).
WRITELOG is a very common type of wait, so don't worry about it. Looking at the SOS_SCHEDULER_YIELD indicate CPU pressure and also the CXPACKET, it is possible that there must be some missing indexes and you may be retrieving lot of data from the queries for an OLTP system. I suggest you to look at the Missing Indexes DMV and see if there are any indexes (almost sure there will be more than few) that are in the questionable procs.
http://sqlfool.com/2009/04/a-look-at-missing-indexes/
Look for Jonathan Kehayias's post on sqlblog.com on this too.
Also, take a look at Parameter sniffing.
http://sommarskog.se/query-plan-mysteries.html
http://pratchev.blogspot.com/2007/08/parameter-sniffing.html
It's NOT a compete answer for your needs but a good starting point. Let us know if you need more details.
Others have already pointed out the culprit: SQL Server accumulates updates in memory (in the buffer pool) and only flushes them out periodically (at checkpoints). The two options suggested (-k and checkpoint interval) are complementary:
- -k will make the cause the checkpoint to produce less aggressive IO requests and last longer
- lowering recovery interval will cause the checkpoint to start more often
But I did not respond only to regurgitate the fine comments you received do far :)
What you're seeing is, unfortunately, a very typical behavior of queued processing. Whether you use Service Broker queues or opt for using tables as queues approach, the system is very prone to this kind of behavior. This is because queuing based processing is write heavy, even more write heavy than OLTP processing. Both enqueue and dequeue primitives are write operations and there are almost no read operations. Simply put, queue processing will generated the most writes (= most dirty pages, and most log) compared to any other workload, even OLTP (ie. TPC-C like workload).
Very importantly, the writes of a queue workload follow an the insert/delete pattern: every row inserted is very quickly deleted. This is important to distinguish from an append-only pattern of a insert heavy (ETL) workload. You are basically feeding the ghost cleanup task a full meal, and you can easily outrun it. Think about what that means:
- enqueue is an insert, it will create a dirty page
- dequeue is a delete, it will dirty the same page again (it may be lucky and catch the page before checkpoint, so it will avoid double-flush, but only if is lucky)
- ghost cleanup will cleanup the page, making it dirty again
Yes, it really means that you may end up writing a page three times to disk, in three different IO requests, for each message you process (worst case). And it also means that the random IO of checkpoints will be really random as the write point of the page will be visited by those moving heads again between two checkpoints (compare with many OLTP workloads tend to group the writes on some 'hot spots', not queues...).
So you have these three write points, racing to mark the same page dirty again and again. And that is before we consider any page splits, which queue processing may be prone too because of the insert key order. By comparison 'typical' OLTP workloads have a much more balanced read/write ratio and the OLTP writes distribute across inserts/updates/deletes, often with updates ('status' changes) and inserts taking the lion's share. Queue processing writes are exclusively insert/delete with, by definition, 50/50 split.
Some consequences follow:
- Checkpoint becomes a very hot issue (no longer a surprise for you)
- You'll see heavy fragmentation (the fragmentation per-se won't matter much as you are not going to do range scans, but your IO efficiency suffers and ghost cleanup has more to work, slowing it down even more)
- Your MDF storage random IO throughput is going to be your bottleneck
My recommendation comes in 3 letters: S, S and D. Move your MDF to a storage that can handle fast random IO. SSD. Fusion-IO if you have the moneys. Unfortunately this is one of those symptoms that cannot be resolved with more cheap RAM...
Edit:
As Mark points out you have two logical disks backed by one physical disk. Perhaps you tried to follow best practices and split log on D: and data on C: but alas is to no avail, C and D are the same disk. Between checkpoints you achieve sequential throughput but as soon as checkpoint starts the disk heads start to move and your log throughput collapses, taking down the entire app throughput. Make sure you separate the DB log so that is not affected by data IO (separate disk).
Related Question
- Sql-server – Service broker internal spid hogging tempdb
- Sql-server – High number of page reads and PLE counter, but only when tracing with Profiler
- Sql-server – Improve table design to improve performance
- SQL Server 2016 Service Broker – Understanding Conversation Lifetime
- Sql-server – SQL Parallel Stored Procedure execution using CLR – Performance
Best Answer
There are several different types of performance counters, five of which are used by SQL Server (at least in terms of what you will find in the
sys.dm_os_performance_counters
DMV). You can see this via the following query:Returns:
Those values mean:
Noncomputational Counter Types
65792
== PERF_COUNTER_LARGE_RAWCOUNT == "Raw counter value that does not require calculations, and represents one sample which is the last observed value only."Counter Algorithm Counter Types
272696576
== PERF_COUNTER_BULK_COUNT == "Average number of operations completed during each second of the sample interval."1073874176
== PERF_AVERAGE_BULK == "Number of items processed, on average, during an operation. This counter type displays a ratio of the items processed (such as bytes sent) to the number of operations completed, and requires a base property with PERF_AVERAGE_BASE as the counter type."Base Counter Types
1073939712
== PERF_LARGE_RAW_BASE == "Base value found in the calculation of PERF_RAW_FRACTION"Basic Algorithm Counter Types
537003264
== PERF_LARGE_RAW_FRACTION == "Ratio of a subset to its set as a percentage. This counter type displays the current percentage only, not an average over time."You can find more info in the following blog post: Interpreting the counter values from sys.dm_os_performance_counters
The Stored Procedures Invoked/sec counters in the SQLServer:Broker Activation category are of type 272696576 (i.e. "PERF_COUNTER_BULK_COUNT"). These values are cumulative. You need to capture a value into a variable, wait for N seconds (i.e.
WAITFOR DELAY '00:00:30'; -- 30 seconds
), capture a second value, and then divide the difference between the values by N (i.e.SELECT (@var2 - @var1) / 30.0;
). For example: