Postgresql – Why do the RDS read IOPS drop after a certain period of time

alter-tableamazon-rdsperformancepostgresql

I am trying to add non-null constraints on several columns in a table with about 70,000,000 rows. Each statement is just ALTER TABLE table_name ALTER COLUMN column_name SET NOT NULL;, and each statement takes about 4.5 minutes to complete.

During this time, the read IOPS spike to about 8,000, and stay there until the statement completes. However, after executing several of these statements, the IOPS drop to 2,000 and are capped there. At this point each statement takes four times as long. If I wait a while, I can get another 30 minutes or so at 8,000 IOPS.

At first I thought this was the burst credits being used and then regenerated, but

The burst balance never drops below 98% during this entire time.
According to the documentation, I should have a balance of about 2,970, not 8,000.

The database details are

Postgres 11.8 on AWS RDS.
Instance size is db.t3.large.
990 GB of GP storage.
No provisioned IOPS.

My questions are

What is going on here? What is the significance of the 8,000 and 2,000 numbers? I can't find documentation about this anywhere.
Why isn't burst balance being used here?
Is 4.5 minutes to add the constraint on a 70 million-row table reasonable? And in general, is there a more efficient way I can be doing this? I still have over 60 columns to alter.

Read IOPS

Write IOPS

Burst balance

CPU

Memory

Best Answer

I've not found the IO burst balance to work the way the documents state nor the monitoring of the burst balance to reflect reality, for t* class instances. So while I can't explain your observation, it does not surprise me.

And in general, is there a more efficient way I can be doing this?

A single ALTER TABLE can implement a number of actions, not just a single action. And it should generally scan the table only once, not once for each action.

ALTER TABLE table_name 
   ALTER COLUMN column_name1 SET NOT NULL,
   ALTER COLUMN column_name2 SET NOT NULL,
   ALTER COLUMN column_name3 SET NOT NULL,
   ALTER COLUMN column_name4 SET NOT NULL,
   ALTER COLUMN column_name5 SET NOT NULL...;

Related Solutions

Amazon-rds – IOPS on Amazon RDS

will select query count as 1 IOP or would it equal the number of records being returned?

No. Assuming that Amazon accounts IOPs accurately (after all, they are virtual machines), there are some abstractions between SQL and disk IO operations:

SQL Operations: can be seen with the com_select, com_insert, etc. counters on SHOW GLOBAL STATUS. Please note that if you have a cache like the query_cache, close to no disk operations will be done (because results will be returned directly from memory).
Handler ("row") operations: can be seen with the 'Handler_%'counters on SHOW GLOBAL STATUS
Engine operations: reading a single row, or inserting a single row (Handler_write) will be done differently for each engine, implying MySQL buffering or not. For example, for InnoDB you can check the GLOBAL STATUS variables for the read request to the buffer pool versus the disk read requests. A single row read may require several reads on indexes and different versions of the rows in the UNDO history before being returned.
Filesystem cache: data that is frequently read or written will usually be cached on memory by the operating system. This will speed up subsequent reads and writes by avoiding actual disk IO (specially on places like Amazon, where de IO can be on a local disk or over the network with non-deterministic response).
Disk IO: even when dealing with actual IO, writes are usually done in blocks.

I have not gone in detail about MySQL and OS internals, and even without doing that you can see that things are not obvious- in fact reducing IO and knowing why there is so much of it in some cases is one of the most important fields for database optimization.

An engine like InnoDB will require all of its pages to be written 3 or 4 times on insertion (one on the transaction log, one on the actual tablespace, one on the double write buffer and optionally, on the binary log)- and this is a simplification- it does not have into account index updates, metadata, statistics, etc. The best way to know how many IOPS you will need is to test on a particular setup. Even with some fake smaller tests it will be more reliable than tying to theorise. Caching both at OS and DB level will make the numbers dramatically different.

For example, the other day I inserted dozens of millions of records at 200.000 rows/s with LOAD DATA (no SQL overhead) because I had enough memory to write almost exclusively to the InnoDB buffer pool. It took several minutes for the disk to synchronise with the memory contents, though.

Sql-server – Slow running aggregate of aggregate query

As you are repeating this query for multiple months then you will be continually re-aggregating the same rows.

For example the rows in the first month will always be brought back by the t1.post_date < @report_date criteria so will be re-processed for every month.

To avoid this I'd probably consider working through it in an iterative way a month at a time from the start. Dependent on the volatility of historic data I might also consider storing the pre-calculated results in the database rather than re-calculating these each month.

To calculate this at run time you could create a temporary table with the following structure.

CREATE TABLE #balance
  (
     department_id   INT NOT NULL,
     location_id     INT NOT NULL,
     account_id      INT NOT NULL,
     balance_to_date MONEY NOT NULL,
     PRIMARY KEY (department_id, location_id, account_id)
  );

You could also consider adding the following index on your transactions table

ALTER TABLE transactions
  ADD post_date_year_month AS (10000 * YEAR(post_date) + MONTH(post_date))

CREATE INDEX ix
  ON transactions(post_date_year_month, department_id, location_id, account_id)
  INCLUDE (amount)

Then extract a month at a time from transactions and merge into #balance (with a when matched then increment, when not matched insert).

The leading post_date_year_month column means that as long as you write the query sargably the extraction of each month can be done efficiently and the extracted rows for a month will be ordered by department_id, location_id, account_id making a merge join against #balance possible without a sort.

Whilst that could benefit this particular query you'd need to assess the utility of this index against your overall workload.

Then calculate the department_id, location_id totals from #balance (can leverage the PK order to avoid a sort) and store those somewhere and move onto the next month.

(Or possibly instead of #balance you could use a "temporary" permanent table balance and create an indexed view on that to avoid the separate explicit aggregation step and just copy the values straight from that before moving on)

Best Answer

Related Solutions

Amazon-rds – IOPS on Amazon RDS

Sql-server – Slow running aggregate of aggregate query

Related Question