How Table Size Affects Performance in MySQL

myisamMySQLoptimizationperformance

Let's say I have a MyISAM table (amongst other tables) with 2938347 rows. This table is merely being used for logging and doesn't get emptied very often.

By not regularly archiving or deleting these rows, does having this table around affect server performance?

Then, what about tables/databases on a MySQL server that are massive in size, but aren't accessed all that often. These tables co-exist with frequently accessed/queried tables, and I'm wondering if they're causing problems or not.

Darius

Best Answer

This would depend upon the queries you issue (suppose the large table is mydb.logtable).

ASPECT #1

For starters, think about the MyISAM storage engine. It only caches index pages in the MyISAM key cache (sized by key_buffer_size). If any queries against mydb.logtable reads a significant number of index pages from /var/lib/mysql/mydb/logtable.MYI, the query mya purge out index pages from other tables. This would causes other queries for those other table to have to reread them index pages from disk again.

ASPECT #2

Any queries against a large MyISAM table that is infrequently accessed may have old index statistics, possibly resulting in full table scans down the road. During off hours, you should setup a crontab job that runs this SQL command:

ANALYZE TABLE mydb.logtable;

This will rebuild the index statistics.

ASPECT #3

Regardless of Storage Engine (or even RDBMS), any query that requests more that 5% of a table's total size would cause the MySQL Query optimizer to stop using indexes and do full table scans. For example, if you query data for a year and a table contains less than 20 years of log info, it is very probable that a full table scan will happen.

ASPECT #4

If the cardinality of the indexes is very low, a certain key combination could result in a query not using any indexes and do a full table scan.

For a demonstration of low cardinality affecting query performance and EXPLAIN plain generation, see my Nov 13, 2012 post Must an index cover all selected columns for it to be used for ORDER BY?

SUGGESTION

You should schedule some form of log rotation. Perhaps this:

SET @tb1 = 'mydb.logtable';
SET @tb2 = CONCAT(@tb1,'_',date_format(now(),'%Y%m%d_%H%i%s'));
SELECT CONCAT('ALTER TABLE ',@tb1,' RENAME ',@tb2) INTO @sql_1;
SELECT CONCAT('CREATE TABLE ',@tb1,' LIKE ',@tb2) INTO @sql_2;
PREPARE s FROM @sql_1; EXECUTE s; DEALLOCATE PREPARE s;
PREPARE s FROM @sql_2; EXECUTE s; DEALLOCATE PREPARE s;

This will keep the log file lean and mean and archive the most recent entries.