Mysql – How to estimate the I/O operations performed by MySQL queries

MySQL

On Amazon's RDS, you pay about $0.10 per 1 million I/O requests. I've noticed a very high number (in hundreds of thousands) of I/O requests for a very low traffic database. On seeing this, I did some further research, and found this question which describes that for a 6,000 user website, he was generating 800 million I/O requests a month, which would cost him about $80 / month.

Therefore, I want to know in advance how many I/O operations a MySQL query would generate, and how to optimize / minimize them. Is there any way to estimate how many I/O operations a query would perform, and any general rules I can follow to keep them as low as possible?

Best Answer

_{Community Wiki answer generated from comments on the question by Raymond Nijland}

Use EXPLAIN. This way you can see if an query MAY need disk IO. You need to avoid in the column extra "Using temporary" or ""Using temporary; Using filesort (note the filesort is a misleading name: if the result set fits in memory, quicksort is run in memory)".

This is most likely to be caused by subqueries/unions/order by/group by/... If your result is big and a MyISAM disk-based temporary table is created, and you need to sort the result, you are sorting the result set based on IO reads and IO writes with quicksort algorithm.

In Internal Temporary Table Use in MySQL you can read when MySQL needs to create a disk-based MyISAM table. Maybe you can use avg_row_length * rows (though note that the rows value from explain is not exact with InnoDB engine) to check if the result fits in the heap. See SHOW TABLE STATUS Syntax.

In general is InnoDB or MyISAM better for avoiding I/O requests?

InnoDB will buffer table data and index data, whereas MyISAM only buffers index keys. An I/O to the table data is needed when explain column extra doesn't say "Using index".

If both are using indexes: With InnoDB, if the buffer is hot it can load data from memory. If indexes need to come from disk there a formula that you can use to calculate needed IO reads for selects, inserts, and updates. From Estimating Query Performance:

For small tables, you can usually find a row in one disk seek (because the index is probably cached). For bigger tables, you can estimate that, using B-tree indexes, you need this many seeks to find a row:
log(rows) / log(index_block_length / 3 * 2 / (index_length + data_pointer_length)) + 1

InnoDB indexes are larger because it stores data off the PRIMARY / UNIQUE key into an KEY index. This is faster and needs even less IO seeks but you can compress InnoDB data or indexes.

Related Solutions

Mysql – optimizing MySQL for traffic analytics system

You can simplify your query to something along these lines. I expect MySQL will generate a simpler execution plan.

SELECT date(date) period, count(*) clicks
FROM visits
WHERE url_id = 3
  AND DATE > DATE_SUB( CURRENT_TIMESTAMP( ) , INTERVAL 1 MONTH ) 
GROUP BY period;

If returning results on high-traffic links in one second is a hard requirement, you might need to upgrade your hardware.

Expect to need an index on each column used in a WHERE clause. You might benefit from some multicolumn indexes; {url_id, date} is a candidate.

Test at scale, if that's at all possible. (It's usually possible, although it might take some time.) Use EXPLAIN to see what MySQL is doing with your queries.

You don't have to query browsers, countries, and everything else all at one time. When I was running web development, I rarely looked at countries--they weren't relevant to the niche I was working in. Also consider other asynchronous UI technologies.

PostgreSQL has a better optimizer. It might work better than MySQL. Test it.

MySQL Queries – How to Determine Queries Per Day

For SELECTs:

show global status like "Com_select";

UPDATEs:

show global status like "Com_update";

INSERTs:

show global status like "Com_insert";

DELETEs:

show global status like "Com_delete";

ALl values are "cumulativ" since MySQL last restart.

So to get your SELECTs in one hour:

At 9pm:

[21:00:00] [DEV\(none)] mysql> show global status like "Com_select";
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Com_select    | 671664 |
+---------------+--------+
1 row in set (0.00 sec)

At 10pm:

[22:00:00] [DEV\(none)] mysql> show global status like "Com_select";
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| Com_select    | 672363 |
+---------------+--------+
1 row in set (0.00 sec)

The number of SELECT in the past hour : 672363 - 671664 = 699

Best Regards

Best Answer

Related Solutions

Mysql – optimizing MySQL for traffic analytics system

MySQL Queries – How to Determine Queries Per Day

Related Question