Mysql – a storage need for MySQL index on Varchar(100) in InnoDB

indexinnodbMySQLvarchar

I have a InnoDB table, which has EMail column (VARCHAR(100)), now I will run script which will hash some of that email addresses. I am wondering, if there is a difference in data storage (for example average of emails length is 20 characters and after hash it will be 64 characters), will be there a difference in index storage as well?

Best Answer

I have discussed having large PRIMARY KEYs for InnoDB before : What storage engine should I use for this MySQL table?. The effect would be bloated keys experiencing linear growth of all Secondary Indexes because the PRIMARY KEY reference from a Secondary Key Index Entry would also start getting bloated as well.

Looking at your question, you yourself are saying you will be expanding an email address from 20 to 64 characters. You will be bloating every non-unique index for the table in question that has an email address column. If the email address is itself the primary key, then all indexes for the table in question whether an index has an email address column or not. There are other viewpoints discussed in Mysql int vs varchar as primary key (InnoDB Storage Engine?

It is bad enough that a CHAR field is fast to read than a VARCHAR field but at the expense of having larger indexes (See my post What is the performance impact of using CHAR vs VARCHAR on a fixed-size field?). Doing this will certainly introduce configuration challenges including

Having a Large Enough InnoDB Buffer Pool (innodb_buffer_pool_size)
Having Large Enough Transaction Logs (innodb_log_size_size)
Having Sufficient Read and Write IO Threads (innodb_read_io_threads,innodb_write_io_threads)
Having a Large Enough Log buffer (innodb_log_buffer_size)

Even if email address is not indexed, data pages for that InnoDB will still have bloating and possibly fragmentation nonetheless. Since you are hashing, I can presume you must be indexing it.

EPILOGUE

If the table is fairly-to-heavily used in Transactions, the table must stay InnoDB. You can take better advantage of your idea in MyISAM. You can go forward with the hash idea. Please make sure the PRIMARY KEY is single integer column (BIGINT if you know you will exceed 2 billion rows. Otherwise, INT). I would do a major RAM upgrade and increase the InnoDB Buffer Pool size accordingly.

Mysql – Why isn’t this index helping the InnoDB MySQL query

This is just a guess, as I do not have all info, but you probably would be better by doing:

EXPLAIN SELECT STRAIGHT_JOIN
    *
FROM
    tusers PARTITION (p362) tu
    JOIN users PARTITION (p362) u
      ON u.group_id=tu.group_id 
      AND tu.email_address=u.email
      AND tu.group_id = 362 
WHERE
    tu.application_id=253555;

Note the STRAIGHT_JOIN, that may not be needed -if it is needed, then I may have assumed wrongly- and the tu.group_id comparison (that, again, shouldn't be needed).

Then using the following keys:

(tu.application_id, tu.group_id, tu.email_address)
(u.group_id, u.email)

However, if the number of records to be returned is 2.5M, as your cardinality suggests, then do not expect this to be fast... this is a pure IO math.

There are many other things that clicks me as problems, but I cannot say for sure without access.

Those could be even more effective if you didn't do a SELECT *.

Another thing is that varchar(255) is usually a bad idea.

Best Answer

Related Solutions

Mysql – Varchar index – will hashing value make it faster

EPILOGUE

Mysql – Why isn’t this index helping the InnoDB MySQL query

Related Question