MySQL – Varchar(255) Field vs Hash Field for Performance

checksumindexMySQLperformancevarchar

To consider a table with millions records and the table schema:

CREATE TABLE `foos` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `foo` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `foo_UNIQUE` (`foo`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Ton of this query executes every second

SELECT 1 FROM foos WHERE foo=?

If no record finds, this query will be executed:

INSERT INTO foos(foo) VALUES(?)

The average length of foo less than 20. To optimize performance, a hashing field is considered to add with md5(foo) and drop the unique key.

ALTER TABLE foos DROP INDEX foo_UNIQUE;
ALTER TABLE foos ADD INDEX `foo_IDX` (`hash`);

And the query will be changed as:

SELECT 1 FROM foos WHERE hash=?

If no record finds, this query will be executed:

INSERT INTO foos(foo, hash) VALUES(?, ?)

The Question is: Will the SELECT query run faster, since average length of foo less than 20 but hashing field length is always 32?

Best Answer

[In this answer, I assume using MD5 as hashing function]

The answer is YES. Adding a "hash" field and querying it would run faster.

Details: When indexing a varchar(255) field, although the average length is 20 char, each entry in the index will be saved in its full length, i.e. 255 char. Add to this that if you are using utf8, the entry length would be 255*3 bytes (plus the PK length).

When adding a hash field, make sure it has a fixed length (32 in case of MD5), and that the CHARSET is latin, i.e. 1 byte per char. In this case, the entry in the index will be 32 bytes (plus the PK length)

If you want to guarantee the uniqueness of foo field, it is recommended to add a unique index on the hash field (as opposed to a regular index)