SQL Server RDS – How Many Read IOPS Are Too Many?

amazon-rdssql server

Amazon's best practices indicate you should have enough memory on your RDS instance to hold most of your dataset in memory, and that not having enough will result in more Read IOPS, as information is swapped into memory.

Our instance (SQL Server m3.medium) generally shows 150MB of "freeable memory," which doesn't seem like much of a margin. Our read iops ranges from 6-8 on average with momentary spikes to as much as 120 (or in one isolated case, 224).

We are not unhappy with our performance right now (coming from a far-inferior hosting environment), but as we try to decide whether to purchase a reserved instance, we'd like to know we're "ok" on resources for the term of the lease.

Thanks for any insight!

Best Answer

we'd like to know we're "ok" on resources for the term of the lease.

This is a "feeling" now you'll have to put this feeling into tangible, can be tracked and trended. This is how you can make predictions about your workload.

What would this look like? There are three areas you'll need to document:

What are the limits of my hosted solution?

There may be more than this, it's just to get you started thinking.

  1. How much CPU (total) does the hosted environment have? (1 CPU = 100%)
  2. How much RAM (total)?
  3. How many Disks, Total Space, IOPS/Disk, IOPs Total
  4. What is my networking bandwidth (Per card and totaled)

What should I monitor to trend my workload appropriately?

Whatever the limits of the hosted environment are, you'll definitely want to be gathering those metrics and statistics. There may be additional items that are Application defined which should also be included.

For example, if your application is a website that takes orders from users and the fulfills them there may some metrics you could add at the application layer:

  1. Orders/Hour
  2. Milliseconds to accept/place an order

Then the environment metrics based off of the first paragraph of information (not exhaustive, just an example):

  1. CPU usage
  2. Page file/Memory
  3. Per disk IOPs, average seconds per IOP (transfer), etc.
  4. Network Bandwidth

What are acceptable SLAs?

If an order takes 50 ms to place, is that bad? Is 2.5 seconds good? How many orders per hour must be accepted, minimally? Are there stored procedures associated with these orders that should be watched and have certain thresholds put on them?

Putting all of this information together will allow for the forecasting of expected volume, current volume, and potential needs in order to meet your SLAs for the business.

How many places do I know that actually do this? You'd be surprised. It's taken a few of them years to put it in place but many have these statistics readily viewable and available at a moments notice. Is it easy to do? Yeah, it's just time consuming. If you don't want to put in the time, just buy a bigger/better hosted solution - if you want to create a repeatable diagnostic infrastructure in order to provide this information then be willing to customize it to your environment.

Summary

I can't speak for RDS (amazon) and in all transparency, I work for Microsoft - but there are other hosted solutions such as Azure that already monitor many of these systems based items (CPU, Memory, IO, etc) for you. This puts the onus on you to grab the information and create your own application based metrics.

Each environment is different and while the system based attributes are a great start for monitoring, the whole picture is needed in order to make appropriate business decisions.