MySQL – Select Count on Big Table Takes 15 Minutes

amazon-rdsMySQLmysql-workbench

I try to run select Count on table with like 16M rows, with MySql 5.5 and workbench 6.2.
The query running almost 1000sec . Is it normal?
The query :

SELECT COUNT(1) FROM My_Table

when I run explain I see that mysql using the pk index.
Don't have other indexes or Unique values in the table.
I know I can use the estimation in ALL_TABLES , but I need accurate number.
Do you have any ideas ?


  • The DB locate on AMAZON AWS – RDS , and I using the defualt parameter group ,
    There are any changes you recommended to do?

Thanks

Best Answer

Yes, it could be normal. With a "large table", you are likely needing to do a lot of I/O (unless you're using InnoDB and you've got a huge buffer pool allocated.)

Unless you're paying for Amazon Provisioned IOPS, your I/O rate is going to be somewhat constrained on shared volumes.

http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html

So, two basic approaches to addressing the issue: 1) increase the rate at which you can perform I/O from storage (i.e. pay for Provisioned IOPS), and 2) re-organize your data so that requests don't require as much I/O. And that's going to depend somewhat on which storage engine you're using (MyISAM, InnoDB, or something else), but the design goal would be the same, get a much smaller index which MySQL can scan, rather than scanning the entire table. Use EXPLAIN to determine the execution plan MySQL is using, whether it's going to use your index.

Also, you might consider implementing a more advanced approach, for example, if there is a large portion of the table that is "static" history, where the count of those rows won't change. Get the count of those rows separately, and store that result.

For example, all rows that have a date column before a given value... get that count and store it in a small table. This would only have to be done once:

 CREATE TABLE my_counts (before_dt DATE, mycount BIGINT); 
 INSERT INTO my_counts (before_dt, mycount) 
 SELECT '2014-11-01' AS before_dt, COUNT(1) AS mycount
  FROM mytable t 
 WHERE t.dt < '2014-11-01';

Then, to get an "up-to-date" count, you can reference that stored value, and then you'd only need to get a count of rows that aren't included in that count. An index on the dt column would be appropriate. Combine the result from:

SELECT COUNT(1)
  FROM my_table t
 WHERE t.dt >= '2014-11-01' 

With the result from:

SELECT mycount
  FROM my_counts c
 WHERE c.before_dt = '2014-11-01' 

That is, you can live with a "stale", cached count of a portion of the table.