MySQL 8.0 – CHAR vs VARCHAR Performance

MySQLmysql-8.0performancequery-performancevarchar

In 2018, does it still hold true that MySQL (InnoDB) CHAR performs better VARCHAR speed-wise?

I've only found a performance comparison explanation that is 7 year old. I've checked the official MySQL 8 CHAR vs VARCHAR but the article speaks about storage only, not about performance. I've also read a notice that VARCHAR causes some problems with table fragmentation (see the very 1st comment to this answer.

Best Answer

I seriously doubt that it ever held true for InnoDB. Even for MyISAM (which really can have FIXED length rows) it was rarely true.

Roughly speaking, this is the order (most overhead first)

Disk hit(s).
Locate the record(s).
Parse the record (split it into columns).
Apply functions and evaluate expressions.

If the data is bigger (as in CHAR being padded), the #1 is impacted. The rest pales into insignificance.

Note: There are 4 flavors of InnoDB ROW_FORMAT.

For REDUNDANT, a CHAR(100) CHARACTER SET utf8mb4 always takes 400 bytes.
For COMPACT or DYNAMIC, it takes between 100 and 400 bytes.
For COMPRESSED, the padding spaces are highly compressible, but still take some space.

Allegedly, the "fixed" size of a CHAR (which, is not all that fixed), tends to prevent block splits. But I kinda doubt it.

My Rule of Thumb: CHAR should almost always be restricted to truly fixed length strings, and most of them are hex or ascii:

country_code CHAR(2) CHARACTER SET ascii
british_postal_code CHAR(6) CHARACTER SET ascii
zip5 CHAR(5) CHARACTER SET ascii
md5 CHAR(32) CHARACTER SET ascii  -- UNHEX() into BINARY(16) is tighter
base64 CHAR(?) CHARACTER SET ascii COLLATION ascii_bin  -- need case-significance

Note: latin1 works equally well as ascii, the 8.0 default of utf8mb4 does not.

(See also my answer to the first link. It starts "Most of the answers in this thread are 5 years old, written before InnoDB and utf8 were defaults. So, let me start over..")

Related Solutions

Mysql – thesql session logging tables

First, from looking at your existing indexes and comparing them to your example query, you are missing an index on just accountid. The way MySQL handles indexes is left-most, meaning you can have a composite index like this:

KEY `contactaccounts` (`contactid`,`accountid`,`journalcode`) USING BTREE,

and run a query that looks for contactid and the contactaccounts index would be a potential index*. However doing a query on accountid will not utilize the index, because accountid is not the left-most column.

If you never search for contactid without an accountid, I would create the index like this:

DROP INDEX `contactaccounts` ON `tbl_sessions_2012`;
CREATE INDEX `accountscontact` ON `tbl_sessions_2012` (`accountid`,`contactid`,`journalcode`);

Now, analyzing your other indexes, the first of each set is redundant using the left-most rule, and can be dropped in favor of the second:

KEY `contactaccounts` (`contactid`,`accountid`,`journalcode`) USING BTREE 
KEY `contactaccountloginmonths` (`contactid`,`accountid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

KEY `contactaccountcollections` (`contactid`,`accountcollectionid`,`journalcode`) USING BTREE
KEY `contactaccountcollectionloginmonths` (`contactid`,`accountcollectionid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

KEY `organisationaccounts` (`organisationid`,`accountid`) USING BTREE
KEY `organisationaccountloginmonths` (`organisationid`,`accountid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

KEY `organisationaccountcollection` (`organisationid`,`accountcollectionid`,`journalcode`) USING BTREE
KEY `organisationaccountcollectionsloginmonths` (`organisationid`,`accountcollectionid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE

By having the duplicates you are using up a lot of space on indexes.

_{* I say potential because there is still the possibility that your index will not be used due to the amount of rows MySQL has to scan through. As your account_id IN () statement grows, MySQL will determine that it's faster just to do a full table scan regardless of the index.}

Mysql – char vs varchar in thesql

To elaborate on ypercube's correct comment, the problem is not that there is fragmentation on the VARCHAR, but in rows that contain variable length data.

If I were to have a table with these values:

+---+-------+
| i | v     |
+---+-------+
| 1 | some  |
| 2 | small |
| 3 | text  |
+---+-------+

And then issue:

UPDATE my_table SET v = 'dinosaur' WHERE i = 2

Then, depending on the storage engine, I might find that there's just no room for the word 'dinosaur' within the existing row location, since rows were written sequentially and compactly to disk. That might mean I would need to change the location of row #2, or place my 'dinosaur' text on an external location, or what have you.

On the other hand, if I were to issue:

UPDATE my_table SET v = 'a'

There would now be a lot of wasted space; I might be interested in reclaiming that space.

This is the fragmentation problem within rows.

Best Answer

Related Solutions

Mysql – thesql session logging tables

Mysql – char vs varchar in thesql

Related Question