Mysql – Index dive/sampling row count very inaccurate in MySQL

indexinnodbMySQLoptimization

The table emp_address_phone is filled with many random values.

CREATE TABLE `emp_address_phone` (
  `address_no` int(11) NOT NULL,
  `emp_no` int(11) DEFAULT NULL,
  `street1` varchar(255),
  `street2` varchar(255),
  `landmark` varchar(255),
  `city` varchar(50),
  `state` varchar(50),
  `country` varchar(50),
  `phone` varchar(20),
  PRIMARY KEY (`address_no`),
  UNIQUE KEY `emp_no` (`emp_no`),
  UNIQUE KEY `phone` (`phone`)
) ENGINE=InnoDB AUTO_INCREMENT=21001 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

The query is on address_no column.

select * from emp_address_phone where address_no > 90;

There are really 6427 rows totally and all of them satisfy the criteria also. But the estimate was 3118.

From optimizer trace:

{
    "index": "PRIMARY",
    "ranges": [
      "90 < address_no"
    ],
    "index_dives_for_eq_ranges": true,
    "rowid_ordered": true,
    "using_mrr": false,
    "index_only": false,
    "rows": 3118,
    "cost": 317.29,
    "chosen": true
}

I changed the number of pages that index dive algorithm samples to 300 from 20 in the cnf file. So server startup with innodb_stats_persistent_sample_pages value 300. I run Analyze table. But the estimate is still way off.

Trace when innodb_stats_persistent_sample_pages = 300

{
    "index": "PRIMARY",
    "ranges": [
      "90 < address_no"
    ],
    "index_dives_for_eq_ranges": true,
    "rowid_ordered": true,
    "using_mrr": false,
    "index_only": false,
    "rows": 3213,
    "cost": 322.81,
    "chosen": true
}

The total number of pages in clustered pk index was 289 pages. So all the 289 must have been used for estimation right? Why is it still way off?

Is the innodb_stats_persistent_sample_pages variable's purpose different? If so, is the number of pages that is read while index dive not controllable?

Also, innodb_buffer_pool_load_at_startup and innodb_buffer_pool_dump_at_shutdown are off.

Best Answer

Yes, the samples are inaccurate. A factor of 2 is actually somewhat common.

It could be that the "random" probe is not very random, and may repeat some pages.

OPTIMIZE TABLE, which is generally useless, would rearrange the data, probably enough to change the statistics.

Consider writing a bug report at bugs.mysql.com .

A side comment: A table with 3 unique indexes is extremely rare; it usually indicates a flaw in the schema design. Perhaps nepotism is not allowed in the company?