MySQL 8 – INT Primary Count(*) 13500 Times Slower in InnoDB

innodbMySQLmysql-8.0

I am aware there are many questions on this topic, mysql has not improved much on these single-query performance bugs in the past years but my case is more unusual.

If mysql would access the table index as it was supposed to do and count the entries it should take 70mil entries * 4 byte per entry = ~275MB at ~1000mb/sec = 270 milliseconds to count the entries
The actual mysql performance is about 13500 times slower than it should be.

Given that I have 100 gig of RAM for innodb I'd expect a subsequent count to be done in single digit milliseconds (not in again one hour)

I have the same behaviour on 3 servers, with slgihtly 3 different mysql versions on 5 different innodb tables, so it's not a localized issue.
I have had more severe cases than the current one (count takes about an hour, I've seen it going for days too).
In this case the table is compressed (which should actually increase the speed)

1590603 root    localhost       locdb1       Query   2004    Sending data    select count(*) from logos     1003611 0       0

EXPLAIN
{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "35130458.29"
    },
    "table": {
      "table_name": "logos",
      "access_type": "index",
      "key": "PRIMARY",
      "used_key_parts": [
        "id"
      ],
      "key_length": "4",
      "rows_examined_per_scan": 9851928,
      "rows_produced_per_join": 9851928,
      "filtered": "100.00",
      "using_index": true,
      "cost_info": {
        "read_cost": "34145265.49",
        "eval_cost": "985192.80",
        "prefix_cost": "35130458.29",
        "data_read_per_join": "2G"
      }
    }
  }
}


CREATE TABLE `logos` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `domain_name` varchar(64) COLLATE utf8_bin NOT NULL,
  `index_raw` mediumtext COLLATE utf8_bin,
  `a` mediumblob,
  `ab` varchar(3) COLLATE utf8_bin DEFAULT NULL,
  `b` mediumblob,
  `bb` varchar(3) COLLATE utf8_bin DEFAULT NULL,
  `last_error_code` smallint(6) DEFAULT NULL,
  `date_found` date NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=67945433 DEFAULT CHARSET=utf8 COLLATE=utf8_bin ROW_FORMAT=COMPRESSED

IBD Filesize is 370 GiB
Disk performance is reliable 24,000 IPS at max 1.2 GiB/sec
I've tried moving the table on a single non used disk and mysql was accessing it constantly at 4 MiB/sec (in that test case max speed would actually be 250MiB/sec at max 16k iops)
CPU is 20% loaded

The table in this case was freshly rebuilt, I've also tested this on freshly optimized tables without improvement.

I'd be happy with 5 seconds as well. I'm not happy with an hour.

Best Answer

Add INDEX(last_error_code), then do SELECT COUNT(*) FROM logos, it will run a lot faster because it will use the much smaller BTree for that secondary index.

SHOW TABLE STATUS LIKE 'logos' will give you an approximate count.

Why is the PK bloated? InnoDB orders the data by the PRIMARY KEY; hence the PK, as an INDEX is essentially included in the data. So to fetch a PK implies fetching the entire row. (Caveat: Bulky columns such as your texts and blobs are stored in a separate place, so they are not fetched when doing the COUNT(*).)

The Data+PK is in one B+Tree; each secondary index is in its own B+Tree.

Any "size" math (such as INT takes 4 bytes) needs to be multiplied by about 2 or 3 due to various overheads -- per column, per row, per block, etc. And the block nature has its own overhead due to block splits.

If the entire table happens to be cached in RAM (that is, in the innodb_buffer_pool), then there won't be any I/O. Still there is a non-trivial amount of CPU work to scan through your millions of rows.

I would guess that your table is slightly bigger than will fit in the buffer_pool. So, every time you do that COUNT(*) it is competing with all other queries for room in the cache. Not cool.

Note that the index I propose will take about 3GB, so it has a chance of being cached and perhaps even staying cached. The first time (after a restart) you do the COUNT(*), it will be slow because of lots of I/O. The second time (unless blocks have been bumped out of cache), it will be faster; perhaps ten times as fast. That will probably be less than an hour, but more than a few seconds.

What is the value of innodb_buffer_pool_size?

How do others solve the sluggishness of COUNT(*)? Have you noticed that search engines, if they say anything about 'how many', give you round numbers like "About 14,000,000 hits"? That's one approach -- periodically get the exact number and round it off until you get around to checking again.

Or they have, say, daily subtotals. Then the query SUMs up the daily counts. This is much faster. (But it only works for stuff that is continually coming in and not changing.)

Or ... (There are other techniques.) What is your use case?

The idea of "random areas, plus extrapolation" probably has a lot of hassles; I would not embark on such. Anyway, that is essentially what SHOW TABLE STATUS does.

MySQL 8.0.14 has Parallel scanning of by PRIMARY KEY (cf innodb_parallel_read_threads) for COUNT(*) w/o WHERE. That will provide a little speedup for your original query.