Mysql – Should I defragment InnoDB table with fixed-length fields

fragmentationinnodbmariadbMySQL

I am running MariaDB 10.2.13 on Linux (Debian).

I am setting up an InnoDB table that will record a lot of data but I will keep only the last one-hour rows. Thus, the number of rows will remain constant.

I should expect on this table:

  • many INSERTs
  • many UPDATEs
  • some DELETE (rows > 1 hour) from time to time

Example:

Table is defined with fixed-length fields only and some indexes.

CREATE TABLE `tbl_log` (
  `ip` int(4) unsigned NOT NULL,
  `date` datetime NOT NULL,
  `external_id` smallint(6) unsigned NOT NULL,
  `counter` smallint(6) unsigned NOT NULL DEFAULT 0,
  PRIMARY KEY (`ip`,`external_id`),
  KEY `external_id` (`external_id`),
  KEY `counter` (`counter`),
  KEY `date_idx` (`date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Inserts (and updates, through on duplicate key) may look like this (ip and external_id will vary):

INSERT INTO tbl_log
  SET ip = INET_ATON('192.168.1.1'),
      date = now(),
      external_id = 123,
      counter = 0
  ON DUPLICATE KEY UPDATE counter=counter+1;

Finally, deleting old rows will be done with a query:

DELETE FROM tbl_log WHERE date < DATE_SUB(NOW(), INTERVAL 1 HOUR);

Question:

Will such a table fragment over time?

If so, I think I should defragment it. If necessary, I planned to run OPTIMIZE TABLE tbl_log; (with option innodb-defragment=1) right after delete…

Best Answer

I would say yes for four(4) major reasons

  1. Page Boundaries / Index Pages
  2. Index Layout
  3. Index Statistics
  4. Bulk DML

PAGE BOUNDARIES / INDEX PAGES

According to the MySQL Documentation on Limits on InnoDB tables

The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a page for 4KB, 8KB, 16KB, and 32KB page sizes. For example, the maximum row length for the default innodb_page_size of 16KB is about 8000 bytes. For an InnoDB page size of 64KB, the maximum row length is about 16000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

Your table uses INT UNSIGNED (4 bytes), DATETIME (8 bytes), 2 SMALLINTs (4 bytes) per row. That's a total of 16 bytes per row. This means you can fit 1000 rows into a single InnoDB page, but there is a very small piece of fragmentation. Using the default innodb_page_size (16K or 16384), there will be 384 bytes of fragmentation with a page. It is most likely used for mapping the data within the one page.

Doing mass INSERTs, DELETEs, and UPDATEs will generate many pages with empty space for whole rows that are marked as unused. This is in addition to the 384 bytes region of the data page.

INDEX LAYOUT

You will generate many index pages

  • You have a 6-byte PRIMARY KEY
  • Each secondary index carries a copy of the PRIMARY KEY
  • KEY external_id (external_id), will be 8 bytes (SMALLINT + PRIMARY KEY)
  • KEY counter (counter), will be 8 bytes (SMALLINT + PRIMARY KEY)
  • KEY date_idx (date) will be 14 bytes (DATE + PRIMARY KEY)
  • 36 bytes for indexing a single row
  • Indexes are represented as BTREEs, so you are looking at O(n log n) space utilization. If you account for non-leaf page splits, many index pages will be 50% empty (without question).

INDEX STATISTICS

Running INSERTs, DELETEs, and UPDATEs will skew index stats since index cardinalities will frequently change. You need to run ANALYZE TABLE tbl_log;. Running OPTIMIZE TABLE tbl_log; will defrag the tables and run ANALYZE TABLE tbl_log; as the last step (See my post from Feb 27, 2013 : What is the difference between optimize table and analyze table in mysql)

BULK DML

The DELETE query you will be doing by the hour is a bulk operation. Many rows will need a ton of housecleaning to keep up with marking rows deleted.