I am working with a table that has a couple hundred million rows of data. It uses varchar(32) UUID's as it's keys both primary and foreign (there are two). I was wondering with this size of table, how much of each field do I need to index to keep good performance but not index every character of those three fields.
For example if I set an index length of 24 characters, will I see any noticeable difference in performance? I was hoping there might be some kind of algorithm out there that can be used to estimate performance based on length of field and length of index.
Currently the table is ~100GB in size, and the index is taking up over half of that space. I would like to recover some of that space, and reduce the growth rate of the table.
Best Answer
This is not an answer to your question (how to cut index length), but an alternative to reach the same objective: use less space in your DB.
If you want to optimize storage you should not store UUIDs as
varchar(32)
(norcharacter(32)
).You can store them as
binary(16)
with no loss of information, and save (at least) 50% of space. A UUID contains 128 bits of information. If stored as a text representing the hexadecimal value, you use 4 bits of information out of everycharacter
, which employs 8 bits assuming the text is encoded asASCII
orlatin1
. So, you waste half your available bits. Storing the information inbinary
format, you waste none.My response is heavily inspired in Storing UUID Values in MySQL Tables, which I would recommend you read.
The following example shows you how to store the same UUID data in the two different formats:
Both
uuid_a
anduuid_b
will be used to store the same UUID, encoded in two different ways.Assuming we have a table called
generator_64k
with 64k of data... we fillt
with lots ofuuid()
using:... and we add also a well known value:
You can then query the table and look for the known value using either of the representations:
You get, in both cases, the same result:
In practice, you would only use
uuid_b
in your table.By doing this, you halve the amount of space used to store your UUIDs (both in table and indexes), without having to resort to using a substring in your index (which can affect your performance, and still not save that much).
See all the code at dbfiddle here