Mysql – Hash index for InnoDB

indexinnodbMySQL

I saw a database schema that looked like this

CREATE TABLE foo (
    name_hash bigint,
    name varachar(500),
    a varchar(500),
    b varchar(500),
    ...
    PRIMARY KEY (name_hash),
    KEY ...
);

This seems to be an attempt to limit index size, by using a 8-byte integer rather than a 100-byte string. When looking up a value by name, the application hashes it, and then uses that hash in the SQL query instead of the name.

This is rather tedious, and I'm not sure if it is necessary.

Is there a feature of MySQL InnoDB that does something similar — that looks up a string by its much shorter hash in order to fit index into memory?

Or does it do something like that already?

Best Answer

InnoDB has no tool to do anything like what you describe.

InnoDB has a limit of 767 bytes per column in an index. That has room for a VARCHAR(255) utf8 or a VARCHAR(191) utf8mb4. Also, InnoDB wants to put long VARCHARs in a different block if the whole record is bigger than about 8KB. Will this be common? (Don't blindly use VARCHAR(500) when you can reasonably state a smaller limit.)

Plan A: Compress (in the client) the name and store it into VARBINARY(255). Assuming it is typical text, the compression will be about 3:1. Use that instead of the Hash.

Plan B: Split the name into 2 or 3 columns so the index limit is abided by. (An ugly solution!)

Plan C: Change the 767 limit. (It's possible, but I forget the details at the moment.)

Something to note: Any decent "hash" is very random. That is, each new row will land in some random place in the table. Once the table exceeds innodb_buffer_pool_size, you will be doing more and more I/O, thereby slowing down.

What will the SELECTs be like? Will you have other indexes on this table? Will you JOIN on name_hash? All of these may have a bearing on the 'best' way to design the schema.

Related Solutions

Storing hex values as strings or integers

I would recommend using the BINARY type; strings waste a lot of space, and integers require conversion.

http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html

For the other part: why do you need the associative table? If objects and fingerprints are 1 to 1, you could either store them directly in the object table, or just have single table with the id, fingerprint, and object id FK.

Mysql – How should I save apache logs into a thesql table

Because this is really logging data that you are capturing, I would first store it in its raw form, then ETL/normalize it as needed.

For searching, you can index the front part of the URLs first few characters. Let's say the options are http:// and https:// - then limit the index size to three characters passed the protocol part.

CREATE TABLE log ( 
datetime_created DATETIME, 
url VARCHAR(1024), 
domainname VARCHAR(255),
someotherdata VARCHAR(255), 
...
INDEX `idx_domain` (domainname(3))) 
ENGINE=InnoDB ROW_FORMAT=COMPRESSED;

InnoDB will allow you to search the table w/o locking it.
Compression will help with disk space.
INDEX idx_domain (domainname(3)) will index the first 3 characters of the domain name column and speed up searches. However, the strategy for REGEXP will be to try to match on the front of the domain name.

Scale will eventually be an issue if the site becomes popular, so buyer beware.

Best Answer

Related Solutions

Storing hex values as strings or integers

Mysql – How should I save apache logs into a thesql table

Related Question