Mysql – Why are the InnoDB tables larger than calculated

amazon-rdsinnodbmysql-5.7

I have a DB where some of the tables are much larger than I think that they should be. For 23 of 39 tables the following query overestimates their size, as it is designed to. But for 16 tables my intentional over-estimate is too small.

select table_name, 
    calculated_row_size, 
    avg_row_length, 
    avg_row_length / calculated_row_size as ratio_actual_to_calculated
FROM
(
    select tables.table_name,
        header + transaction_id + roll_pointer +  null_bit_vector_size +
        sum(case data_type  /* https://dev.mysql.com/doc/refman/5.7/en/storage-requirements.html#data-types-storage-reqs-date-time */
            WHEN 'enum' THEN 1 /* All my enums are small */
            WHEN 'char' THEN character_maximum_length /* Almost all data is supposed to be 1 byte */
            WHEN 'varchar' THEN IF(character_maximum_length < 255, 1, 2) + character_maximum_length /* This is an over-estimate, unless there is lots of multi-byte data */
            WHEN 'text' THEN IF(character_maximum_length < 255, 1, 2) + character_maximum_length /* This is an over-estimate, unless there is lots of multi-byte data */
            WHEN 'int' THEN 4
            WHEN 'datetime' THEN 8
            WHEN 'date' THEN 3
            WHEN 'time' THEN 3
            WHEN 'decimal' THEN 4 * ceil(numeric_precision/9) /* overestimates 10 digits actually use 5 bytes, etc */
        end) as calculated_row_size,
        tables.avg_row_length
    from 
        information_schema.tables
    cross join 
        ( 
            select
        /* https://dev.mysql.com/doc/refman/5.7/en/innodb-physical-record.html */
        5 header,
        6 transaction_id,
        7 roll_pointer 
        ) as dynamic_format_overheads
    inner join
        (
        select table_name, table_schema, CEILING(count(*)/8) null_bit_vector_size
        from information_schema.columns
        where is_nullable = 'YES'
        group by table_name, table_schema
        ) as null_bit_vector on null_bit_vector.table_name = tables.table_name and null_bit_vector.table_schema = tables.table_schema
    inner join 
    information_schema.columns on columns.table_name = tables.table_name and columns.table_schema = tables.table_schema
    WHERE
    tables.row_format = 'Dynamic'
    AND tables.table_schema = 'target_schema'
    AND tables.data_length > @@innodb_page_size /* exclude small tables, fragmentation is going to make them weird */
    group by tables.table_name
) ratio
WHERE  calculated_row_size < avg_row_length 
ORDER BY (avg_row_length / calculated_row_size) desc;

This database us using GUIDs stored as char(36) in UTF8MB4 for all of the primary keys, and therefore foreign keys. If I understand the manual page on dynamic row format then dynamic row format is actually using 1 byte per character to store the GUIDs, as the data is single byte characters.

I know that some of the tables whose size is larger than I can explain have no multi-byte characters in them. I calculate 452 bytes per row at 1 byte per character for this 2,000 page table which I know has no multi-byte characters but avg_row_length is actually 1576! At 4 bytes per character the calculation is 1700 bytes, so that isn't it either.

CREATE TABLE `cashbox_audit` (
`_deleted` enum('T','F') NOT NULL DEFAULT 'F',
`id` char(36) NOT NULL,
`cashbox_session_id` char(36) NOT NULL,
`identity_id` char(36) NOT NULL,
`cashbox_id` char(36) NOT NULL,
`audit_action` char(128) NOT NULL,
`balance` decimal(10,2) DEFAULT NULL,
`occurred_on` datetime NOT NULL,
`portal_id` char(36) NOT NULL,
`seller_id` char(36) DEFAULT NULL,
`cart_id` char(36) DEFAULT NULL,
`ticket_order_id` char(36) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_cashbox_audit_cashbox_session_id` (`cashbox_session_id`),
KEY `fk_cashbox_audit_cart_id_cart_id` (`cart_id`),
KEY `fk_cashbox_audit_cashbox_id_cashbox_id` (`cashbox_id`),
KEY `fk_cashbox_audit_identity_id_identity_id` (`identity_id`),
KEY `fk_cashbox_audit_portal_id_portal_id` (`portal_id`),
KEY `fk_cashbox_audit_seller_id_seller_id` (`seller_id`),
CONSTRAINT `fk_cashbox_audit_cart_id_cart_id` FOREIGN KEY (`cart_id`) REFERENCES `cart` (`id`),
CONSTRAINT `fk_cashbox_audit_cashbox_id_cashbox_id` FOREIGN KEY (`cashbox_id`) REFERENCES `cashbox` (`id`),
CONSTRAINT `fk_cashbox_audit_identity_id_identity_id` FOREIGN KEY (`identity_id`) REFERENCES `identity` (`id`),
CONSTRAINT `fk_cashbox_audit_portal_id_portal_id` FOREIGN KEY (`portal_id`) REFERENCES `portal` (`id`),
CONSTRAINT `fk_cashbox_audit_seller_id_seller_id` FOREIGN KEY (`seller_id`) REFERENCES `seller` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4

I verified the lack of multi-byte data by comparing WHERE LENGTH(audit_action) != CHAR_LENGTH(audit_action) for every character field.

What am I doing wrong?

Best Answer

As per MySQL 5.7 documentation MySQL Glossary, Limits on InnoDB Tables and Limits on Table Size Starting in MySQL 5.6, the page size for an InnoDB instance can be either 4KB, 8KB, or 16KB, controlled by the innodb_page_size configuration option. As of MySQL 5.7.6, InnoDB also supports 32KB and 64KB page sizes. For 32KB and 64KB page sizes, ROW_FORMAT=COMPRESSED is not supported and the maximum record size is 16KB.

The maximum row size for a given table is determined by several factors:

The internal representation of a MySQL table has a maximum row size limit of 65,535 bytes, even if the storage engine is capable of supporting larger rows. BLOB and TEXT columns only contribute 9 to 12 bytes toward the row size limit because their contents are stored separately from the rest of the row.

The maximum row size for an InnoDB table, which applies to data stored locally within a database page, is slightly less than half a page for 4KB, 8KB, 16KB, and 32KB innodb_page_size settings. For example, the maximum row size is slightly less than 8KB for the default 16KB InnoDB page size. For 64KB pages, the maximum row size is slightly less than 16KB. See Limits on InnoDB Tables.

If a row containing variable-length columns exceeds the InnoDB maximum row size, InnoDB selects variable-length columns for external off-page storage until the row fits within the InnoDB row size limit. The amount of data stored locally for variable-length columns that are stored off-page differs by row format. For more information, see Section 14.11, “InnoDB Row Storage and Row Formats”.

Row Size Limit Examples of InnoDB

The MySQL maximum row size limit of 65,535 bytes is demonstrated in the following InnoDB examples. The limit is enforced regardless of storage engine, even though the storage engine may be capable of supporting larger rows.

mysql> CREATE TABLE t (a VARCHAR(10000), b VARCHAR(10000),
       c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000),
       f VARCHAR(10000), g VARCHAR(6000)) ENGINE=InnoDB CHARACTER SET latin1;
ERROR 1118 (42000): Row size too large. The maximum row size for the used 
table type, not counting BLOBs, is 65535. This includes storage overhead, 
check the manual. You have to change some columns to TEXT or BLOBs

The operation succeeds for an InnoDB table because changing a column to TEXT avoids the MySQL 65,535-byte row size limit, and InnoDB off-page storage of variable-length columns avoids the InnoDB row size limit.

 mysql> CREATE TABLE t (a VARCHAR(10000), b VARCHAR(10000),
       c VARCHAR(10000), d VARCHAR(10000), e VARCHAR(10000),
       f VARCHAR(10000), g TEXT(6000)) ENGINE=InnoDB CHARACTER SET latin1;
Query OK, 0 rows affected (0.02 sec)

Storage for variable-length columns includes length bytes, which are counted toward the row size. For example, a VARCHAR(255) CHARACTER SET utf8mb3 column takes two bytes to store the length of the value, so each value can take up to 767 bytes.