MariaDB Query Performance – WordPress Query on MyISAM Tables in MariaDB on ZFS Takes Too Long

debianlinuxmariadb

I use a pretty common setup, some wordpress sites running on Debian 10 (Buster) with mariadb 10.3.27 on ZFS, on a decent server with SSDs and lots of RAM, pretty smooth operation with low CPU utilization. There is this query wordpress makes,

SELECT DISTINCT meta_key 
FROM wp_postmeta 
INNER JOIN  wp_posts ON post_id = ID 
WHERE post_type <> 'product' 
  AND post_type <> 'product_variation';

For some reason I cannot possibly imagine, on database A (InnoDB tables) it runs in about 3 seconds, and on database B (MyISAM tables) it takes over 2 minutes. wp_postmeta has 198938 records and wp_posts 25189 in database B, where on database A they both have many more records. I've tried dumping, deleting the database and recreating it to no avail. If I move the database to another similar server with EXT4, the query on MyISAM tables finishes in about 4 seconds. If I alter the tables to InnoDB, query execution finishes in about 4 seconds. Also, when running the query on MyISAM tables on ZFS I noticed the disk reads spike to about 50MB/s and up to 80MB/s while the query is running. 2 minutes of reads at 50MB/s is 6GB of data, in a database with a total size less than 500MB.

This difference in execution time between MyISAM and InnoDB was noticed on Debian systems with ZFS 0.6 and 0.7. The exact same databases on a FreeBSD/ZFS server display no such difference, with queries executed within a a few seconds for both MyISAM and InnoDB tables.

Does anybody know what could make such a huge difference between MyISAM and InnoDB on ZFS on Linux, and why ZFS on FreeBSD does not show this behavior?

The tables created as:

CREATE TABLE `wp_posts` (
  `ID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_author` bigint(20) unsigned NOT NULL DEFAULT 0,
  `post_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_date_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_title` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_excerpt` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_status` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'publish',
  `comment_status` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'open',
  `ping_status` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'open',
  `post_password` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `post_name` varchar(200) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `to_ping` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `pinged` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_modified` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_modified_gmt` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `post_content_filtered` longtext COLLATE utf8mb4_unicode_ci NOT NULL,
  `post_parent` bigint(20) unsigned NOT NULL DEFAULT 0,
  `guid` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `menu_order` int(11) NOT NULL DEFAULT 0,
  `post_type` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT 'post',
  `post_mime_type` varchar(100) COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `comment_count` bigint(20) NOT NULL DEFAULT 0,
  PRIMARY KEY (`ID`),
  KEY `post_name` (`post_name`(191)),
  KEY `type_status_date` (`post_type`,`post_status`,`post_date`,`ID`),
  KEY `post_parent` (`post_parent`),
  KEY `post_author` (`post_author`)
) ENGINE=MyISAM AUTO_INCREMENT=33318 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

CREATE TABLE `wp_postmeta` (
  `meta_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `post_id` bigint(20) unsigned NOT NULL DEFAULT 0,
  `meta_key` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `meta_value` longtext COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`meta_id`),
  KEY `post_id` (`post_id`),
  KEY `meta_key` (`meta_key`(191))
) ENGINE=MyISAM AUTO_INCREMENT=279035 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

Best Answer

Short answer

(If you must use MyISAM, then the indexes recommended there are not adequate. Let me know if you need more help.)

Long Answer

Several dimensions to this question:

  • Version -- MariaDB 10.3 versus other versions of MySQL/MariaDB. This is probably not significant.

  • OS -- Linux vs FreeBSD vs Windows -- The OS is just a middleman; all are quite mature and have mostly caught up with each other.

  • HDD vs SDD -- SDD drives are inherently faster for any access. Plus they are faster for random accesses due to the lack of a physical "seek".

  • FileSystem -- ZFS versus many others. "Journaling" and other advanced techniques will have some impact on the effective performance.

  • RAID, and especially "battery-backed write cache" has some impact on reads and possibly a big impact on writes. This applies across all disk operations in all applications.

  • InnoDB vs MyISAM -- These differ significantly in locking, caching, the organization of data. Indexes differ a lot. In MyISAM, fetching a block of data uses the FS's blocksize and caches it in the OS. A MyISAM index block is 1KB, which is likely to be smaller than the FS block slide, thereby leading perhaps to an extra level of caching. For InnoDB, all blocks are 16KB. The interactions between these MariaDB block size and the FS block is hard to predict. Peformance at many levels depends on whether the data or index needs to be scanned linearly or probed randomly.

  • The INDEXes -- Above, I focused on this topic because it is where the main improvement can be had. In the situation here (accessing WP's postmeta), will have a significant and far-reaching performance impact.

Impact of INDEX

  • My recommendations include having this composite index: (post_id, meta_key) which is optimal for your slow query. Furthermore...
  • When using a secondary or primary index, MyISAM reaches into the data file with a random seek and fetches a "block", hence performance is dependent on FS and HDD/SSD.
  • With InnoDB, the PRIMARY KEY is "clustered" with the data, thereby eliminating the extra random fetch. Some of my recommendations take advantage of this efficiency.
  • With InnoDB, a secondary index requires a second lookup via the PK.
  • So, in general, it is beneficial to lay out the data in some clustered manner to improve performance. Changing the PK to start with post_id, helps in this respect. Furthermore, InnoDB avoids the random seek that MyISAM requires.
  • "Index prefixing" (eg, "meta_key(191)") is virtually useless. So, part of the performance enhance involves any of about 5 techniques to eliminate prefixing. I'm pretty sure MariaDB 10.2.2 has already fixed the issue; this would let you remove the (191) from any indexes having it. Second best is if you can assure that all keys will always be under 191 characters, then changing the definition of meta_key is a simple answer.
  • All of the above add up to decreasing the 6GB of disk activity, probably significantly. Sorry, but I cannot say specifically if ZFS has any impact.

The specific Query

  1. Scan the entire posts table. There is no getting around this. This will be a linear scan through the "data". The engines do this task quite differently, but neither engine has a strong advantage here.
  2. For each post row, reach into postmeta. This is where the engine and indexes make a significant difference.
  3. Dedup the list of meta_key values -- This may be done in RAM or on disk, at the whim of the Optimizer. You could change a setting to influence that decision, but it would have the side effect of potentially impacting the performance of all other queries, so I won't discuss it.