MySQL – Difference between using count(*) and information_schema.tables for counting rows

countMySQLmysql-5.6

I want a fast way to count the number of rows in my table that has several million rows. I found the post "MySQL: Fastest way to count number of rows" on Stack Overflow, which looked like it would solve my problem. Bayuah provided this answer:

SELECT
    table_rows "Rows Count"
FROM
    information_schema.tables
WHERE
    table_name="Table_Name"
AND
    table_schema="Database_Name";

Which I liked because it looks like a lookup instead of a scan, so it should be fast, but I decided to test it against

SELECT COUNT(*) FROM table 

to see how much of a performance difference there was.

Unfortunately I'm getting different answers as shown below:

enter image description here

Question

Why are the answers different by roughly 2 million rows? I am guessing the query that performs a full table scan is the more accurate number, but is there a way I can get the correct number without having to run this slow query?


I ran ANALYZE TABLE data_302, which completed in 0.05 seconds. When I ran the query again, I now get a much closer result of 34384599 rows, but it's still not the same number as select count(*) with 34906061 rows. Does analyze table return immediately and process in the background? I feel its worth mentioning this is a test database and is not currently being written to.

Nobody is going to care if it's just a case of telling someone how big a table is, but I wanted to pass the row count to a bit of code that would use that figure to create a "equally sized" asynchronous queries to query the database in parallel, similar to the method shown in Increasing slow query performance with the parallel query execution by Alexander Rubin. As it is, I will just get the highest id with SELECT id from table_name order by id DESC limit 1 and hope my tables don't get too fragmented.

Best Answer

There are various ways to "count" rows in a table. What is best depends on the requirements (accuracy of the count, how often is performed, whether we need count of the whole table or with variable where and group by clauses, etc.)

  • a) the normal way. Just count them.

    select count(*) as table_rows from table_name ; 
    

    Accuracy: 100% accurate count at the time of the query is run.
    Efficiency: Not good for big tables. (for MyISAM tables is spectacularly fast but no one is using MyISAM these days as it has so many disadvantages over InnoDB. The "spectacularly fast" also applies only when counting the rows of a whole MyISAM table - if the query has a WHERE condition, it still has to scan the table or an index.)
    For InnoDB tables it depends on the size of the table as the engine has to do either scan the whole table or a whole index to get the accurate count. The bigger the table, the slower it gets.

  • b) using SQL_CALC_FOUND_ROWS and FOUND_ROWS(). Can be used instead of the previous way, if we also want a small number of the rows as well (changing the LIMIT). I've seen it used for paging (to get some rows and at the same time know how many are int total and calculate the number of pgegs).

    select sql_calc_found_rows * from table_name limit 0 ; 
    select found_rows() as table_rows ;
    

    Accuracy: the same as the previous.
    Efficiency: the same as the previous.

  • c) using the information_schema tables, as the linked question:

    select  table_rows
    from    information_schema.tables
    where   table_schema = 'database_name'
      and   table_name = 'table_name' ;
    

    Accuracy: Only an approximation. If the table is the target of frequent inserts and deletes, the result can be way off the actual count. This can be improved by running ANALYZE TABLE more often.
    Efficiency: Very good, it doesn't touch the table at all.

  • d) storing the count in the database (in another, "counter" table) and update that value every single time the table has an insert, delete or truncate (this can be achieved with either triggers or by modifying the insert and delete procedures).
    This will of course put an additional load in each insert and delete but will provide an accurate count.

    Accuracy: 100% accurate count.
    Efficiency: Very good, needs to read only a single row from another table.
    It puts however additional load to the database.

  • e) storing (caching) the count in the application layer - and using the 1st method (or a combination of the previous methods). Example: run the exact count query every 10 minutes. In the mean time between two counts, use the cached value.

    Accuracy: approximation but not too bad in normal circumstances (unless for when thousands of rows are added or deleted).
    Efficiency: Very good, the value is always available.