MySQL 8.0 table with JSON column that uses JSON “merge” operations has 500k rows but analyze table shows 7MM

amazon-rdsjsonMySQLstorage

I have a database table that according to TABLE STATUS has 7MM rows, but when I SELECT COUNT(*) it only has 500k rows.

This is a problem because table growth is increasing and we're running low on storage now.

here is schema: MySQL 8.0.15

CREATE TABLE `tasks` (
  `task_id` binary(24) NOT NULL,
  `task` json NOT NULL,
  `task_kryo` mediumblob,
  `task_type` varchar(180) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`task`,_utf8mb4'$.t'))) STORED,
  `created` datetime GENERATED ALWAYS AS (cast(left(json_unquote(json_extract(`task`,_utf8mb4'$._task.timestamp')),19) as datetime)) STORED,
  `last_updated` datetime GENERATED ALWAYS AS (cast(left(json_unquote(json_extract(`task`,_utf8mb4'$._task.latestStatus.timestamp')),19) as datetime)) STORED,
  `latest_status` varchar(180) COLLATE utf8_bin GENERATED ALWAYS AS (json_unquote(json_extract(`task`,_utf8mb4'$._task.latestStatus.t'))) STORED,
  `marker` binary(24) GENERATED ALWAYS AS (json_unquote(json_extract(`task`,_utf8mb4'$.marker'))) STORED,
  PRIMARY KEY (`task_id`),
  KEY `task_type_index` (`task_type`),
  KEY `created_index` (`created`),
  KEY `last_updated_index` (`last_updated`),
  KEY `latest_status_index` (`latest_status`),
  KEY `marker_index` (`marker`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

This is MySQL RDS on AWS. We have system backups disabled because this table is entirely disposable data. As a result, there is no binary log for this table. AWS disables that when you disable system backups apparently

My suspicion is that we use JSON "Merge" operations in Update statements, and because binary logging is disabled by AWS (we don't have any backups for this table as it is fully disposable / scratch data) somehow it is implementing the updates as inserts on the table (old records remain but are not deleted).

See
https://mysqlhighavailability.com/efficient-json-replication-in-mysql-8-0/

binlog-row-value-options=PARTIAL_JSON

Also, the MySQL log contains this warning on restart:

2020-05-21T20:02:08.506526Z 0 [Warning] [MY-013103] [Server] When binlog_row_image=FULL, the option binlog_row_value_options=PARTIAL_JSON will be used only for the after-image. Full values will be written in the before-image, so the saving in disk space due to binlog_row_value_options is limited to less than 50%.

Best Answer

This has more to do with InnoDB

As stated in the documentation to table status

The number of rows. Some storage engines, such as MyISAM, store the exact count. For other storage engines, such as InnoDB, this value is an approximation, and may vary from the actual value by as much as 40% to 50%. In such cases, use SELECT COUNT(*) to obtain an accurate count.

So COUNT(*) gives you the exact number.