Mysql – How to estimate the I/O operations performed by MySQL queries

MySQL

On Amazon's RDS, you pay about $0.10 per 1 million I/O requests. I've noticed a very high number (in hundreds of thousands) of I/O requests for a very low traffic database. On seeing this, I did some further research, and found this question which describes that for a 6,000 user website, he was generating 800 million I/O requests a month, which would cost him about $80 / month.

Therefore, I want to know in advance how many I/O operations a MySQL query would generate, and how to optimize / minimize them. Is there any way to estimate how many I/O operations a query would perform, and any general rules I can follow to keep them as low as possible?

Best Answer

Community Wiki answer generated from comments on the question by Raymond Nijland

Use EXPLAIN. This way you can see if an query MAY need disk IO. You need to avoid in the column extra "Using temporary" or ""Using temporary; Using filesort (note the filesort is a misleading name: if the result set fits in memory, quicksort is run in memory)".

This is most likely to be caused by subqueries/unions/order by/group by/... If your result is big and a MyISAM disk-based temporary table is created, and you need to sort the result, you are sorting the result set based on IO reads and IO writes with quicksort algorithm.

In Internal Temporary Table Use in MySQL you can read when MySQL needs to create a disk-based MyISAM table. Maybe you can use avg_row_length * rows (though note that the rows value from explain is not exact with InnoDB engine) to check if the result fits in the heap. See SHOW TABLE STATUS Syntax.

In general is InnoDB or MyISAM better for avoiding I/O requests?

InnoDB will buffer table data and index data, whereas MyISAM only buffers index keys. An I/O to the table data is needed when explain column extra doesn't say "Using index".

If both are using indexes: With InnoDB, if the buffer is hot it can load data from memory. If indexes need to come from disk there a formula that you can use to calculate needed IO reads for selects, inserts, and updates. From Estimating Query Performance:

For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row:

log(rows) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1

InnoDB indexes are larger because it stores data off the PRIMARY / UNIQUE key into an KEY index. This is faster and needs even less IO seeks but you can compress InnoDB data or indexes.