Mysql – Trying to find a way fit an extremely large index in to memory

indexmemoryMySQLpartitioning

Description:

I am having trouble with the on-growing data size in my MySQL database.

I am using Ejabberd and MAM function which will make use of an archive table to store messages sent between users and the table keeps growing. It now takes over 10 seconds to query something like

SELECT COUNT(*) FROM archive
    WHERE username = '<some_id>'
      and bare_peer = '<some_string>'
      and timestamp >= '<some_timestamp_in_microseconds'

or

SELECT timestamp, XML, peer, kind, nick FROM archive
    WHERE username = '<some_id>'
      and bare_peer = '<some_string>'
      and timestamp >= '<some_timestamp_in_microseconds>'
      and timestamp <= '<some_timestamp_in_microseconds>'

These are very common SQL that would execute thousands of times each day, and since the SQL are executed from within Ejabberd, I cannot change the syntax.

Current Situation:

  • Instance Specification: 8 core CPU, 64 GB RAM
    innodb_buffer_pool_size: 49392123904 bytes (roughly around 49GB)
  • With references to this post, I got the result of 1005383M (roughly 1TB) estimated requirement of memory size.
  • The archive table size: 700GB of data, and ~200GB of index, around 0.9b of rows (yes, a lot of rows)
  • Here is the table creation SQL:
CREATE TABLE `archive` (
  `username` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
  `timestamp` bigint(20) unsigned NOT NULL,
  `peer` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
  `bare_peer` varchar(191) COLLATE utf8mb4_unicode_ci NOT NULL,
  `xml` text COLLATE utf8mb4_unicode_ci NOT NULL,
  `txt` text COLLATE utf8mb4_unicode_ci,
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `kind` varchar(10) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `nick` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  UNIQUE KEY `id` (`id`),
  KEY `i_username` (`username`) USING BTREE,
  KEY `i_timestamp` (`timestamp`) USING BTREE,
  KEY `i_peer` (`peer`) USING BTREE,
  KEY `i_bare_peer` (`bare_peer`) USING BTREE,
  FULLTEXT KEY `i_text` (`txt`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Discussion:

With the below information, one thing I could think of is to:

  • Partition the archive table with Primary Key (RANGE / Per 5m of rows), but from my understanding, since MySQL doesn't support fulltext index in Partitions, I would be required to drop the fulltext index in the txt column, which I think is ok.

  • Unfortunately, since MySQL could only partition on Primary Keys, and I cannot change the SQL. I therefore cannot utilize the partition directly on the SQL. What I could do is to drop the entire partition regularly and keep the remaining index size to fit into memory as much as possible.

I am posting to seek for a second opinion on whether:

  1. Is this the best way I could do with the above limitations?
  2. If so, How can I partition such a big table without downtime, by using possibility pt-online-schema-change ?

Thank you all for your time.

Best Answer

Fitting the index into RAM is not a useful goal. Decreasing the number of blocks of the index to use is a useful goal.

"Using intersection" is not as fast as the following composite index. Both of your queries would benefit from

INDEX(username, bare_peer, timestamp)

When adding it, you can drop i_username since it is a prefix of this.

PARTITIONing, even if possible, is unlikely to improve performance.

Also, normalizing out the 4 names (in the same table?) would shrink this table significantly, thereby helping performance a little.

What version of MySQL? If it a new one, adding the above index should be relatively non-invasive.

More on designing indexes: http://mysql.rjweb.org/doc.php/index_cookbook_mysql