MySQL COUNT(*) performance

countMySQLperformancequery-performance

I have a table with more than 15m rows. I need the total number of rows. So:

SELECT COUNT(*) FROM thetable;

Which takes around 50s to complete. Explaining gives me Select tables optimized away. I suppose this means that the result can be found only by using an index, then why does it still take so long? Here are some information about the index on the id column (It's non nullable):

Index Type: BTREE (clustered)

Cardinality: 14623100

Unique: YES

How can I improve the performance of this query? Thanks.

Note: The database is MySQL 5.7.1 and using InnoDB engine.

EDIT:

Create statement:

CREATE TABLE `properties` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `address` varchar(255) DEFAULT NULL,
  `locality` varchar(50) DEFAULT NULL,
  `latitude` decimal(13,9) DEFAULT NULL,
  `longitude` decimal(13,9) DEFAULT NULL,
  `state` varchar(10) DEFAULT NULL,
  `created_at` datetime DEFAULT NULL,
  `updated_at` datetime DEFAULT NULL,
  .....
  PRIMARY KEY (`id`),
  KEY `index_properties_on_address` (`address`),
  KEY `index_properties_on_latitude` (`latitude`),
  KEY `index_properties_on_longitude` (`longitude`),
  KEY `index_properties_on_state` (`state`),
  KEY `index_properties_on_created_at` (`created_at`),
  .....
) ENGINE=InnoDB AUTO_INCREMENT=28267712 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED;

Note: I omitted some lines, there are 44 columns.

Explain plan:

+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered |            Extra             |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+
|  1 | SIMPLE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL | NULL     | Select tables optimized away |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+------------------------------+

Best Answer

Back when mysql was not transactionally sound by default (when people regularly used myISAM tables instead of InnoDB because that was the default or, going further back in time, because it didn't exist yet) "SELECT * FROM some_table" without any filtering clauses was one of the query types that peopel banged on about mySQL being much faster at that other database engines.

In a transactionally safe environment generally speaking the database engine will need to check every row and make sure that it should be visible to the current session (i.e. it isn't part of a transaction that is not yet committed (or wasn't committed at the start of this sessions active transaction) or is currently being rolled back) - checking every row implies needing to perform a table scan or (where one is present ) a clustered index scan.

It would be possible for the engine to keep track of the number of rows visible in each object for every active session/transaction, but presumably the designers have not judged this to be worth the extra processing involved so I assume it is not generally considered practical - I can imagine there would be some fairly complex locking requirements to deal with concurrency that would harm performance of other operations too much. You could implement this yourself by keeping a table in which is recorded the count of the rows in the table of interest, and have all your code meticulously maintain that value, but this would be quite some hassle and may be overly prone to errors due to bugs meaning that the count would drift from true over time (and you are probably adding a potential deadlock source and/or locking bottleneck at the application layer).

Situations where row-level security is in use complicate this even more - as well as needing to check the status of a row/page with respect to the current transaction, then engine needs to check again the current user too and as the security rules are dynamic it would be impractical to cache this information further necessitating the scan every time just-in-case. Row-level security is being added to MS SQL Server in the next release (https://msdn.microsoft.com/en-us/library/dn765131.aspx) and is already present in postgres (http://www.postgresql.org/docs/9.5/static/ddl-rowsecurity.html), I don't know about its status in other RDBMSs.