MySQL – Optimize Table by Removing Large Field

MySQLoptimization

I have a MySQL table of articles. It's growing rapidly, and over 50% of the size on disk is accounted for by the article body field, as you might expect. There are no indexes on the body, but 8/16 fields on the table are indexed.

My question is: going forward, will it be more efficient if I move the article body into a separate table? Will I gain more than I lose by forcing extra joins? Average body length is 3.3 kB and maximum length is 414 kB.

CREATE TABLE `articles` (
    `id` CHAR(32) NOT NULL DEFAULT '',
    `site_id` MEDIUMINT(8) UNSIGNED NULL DEFAULT NULL,
    `trend_id` CHAR(32) NULL DEFAULT NULL,
    `date_added` DATETIME NOT NULL,
    `date_scraped` DATETIME NULL DEFAULT NULL,
    `date_published` DATETIME NULL DEFAULT NULL,
    `date_shared` DATETIME NULL DEFAULT NULL,
    `language` VARCHAR(5) NULL DEFAULT NULL,
    `reach` MEDIUMINT(8) UNSIGNED NULL DEFAULT NULL,
    `title` TINYTEXT NULL,
    `summary` TEXT NULL,
    `keywords` TEXT NULL,
    `thumbnail` TEXT NULL,
    `body` MEDIUMTEXT NULL,
    `url` TEXT NOT NULL,
    `status` ENUM('published','pending','disabled') NOT NULL DEFAULT 'published',
    PRIMARY KEY (`id`),
    INDEX `site_id` (`site_id`),
    INDEX `date_published` (`date_published`),
    INDEX `language` (`language`),
    INDEX `reach` (`reach`),
    INDEX `trend_id` (`trend_id`),
    INDEX `date_added` (`date_added`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB
;

Best Answer

PARTITION is unlikely to help.

CHAR(32) -- is it utf8? If so, then it is taking 96 bytes; what a waste. And it is hex, correct? Consider converting to binary and using BINARY(16) to save space. Note that (in InnoDB) the PRIMARY KEY is included in each secondary key. So, 8 indexes times (96-16) bytes = 640 bytes per row saved.

In MySQL there are HEX() and UNHEX(). Your programming language probabls has something.

Yes, do consider "vertical partitioning" to move the body to another table (body + id; primary key is id):

  • Searching in the main table will be faster because of not tripping over the bodies.
  • The JOIN to get the body won't be bad, especially since you won't be fetching many bodies at once.
  • Consider making the body a MEDIUMBLOB and compressing (in the client) the text. This will give you a big savings in disk space. (And it is the only significant saving.) Text typically compresses 3:1.

Make a decision soon. Any change is likely to need both the old copy of the bodies and the new during the conversion. The 50% is scary.

If you have 8 indexes, are any of them "composite"? Are any just on a flag? Please provide SHOW CREATE TABLE.