A table has about 300M records. Each has an unique ID that starts with 13 to 14 digits (with the first digit on the left being either 1 or 2), and is optionally followed by a string of up to 7 ASCII chars. Two examples are 1450123410123
and 21347854123024Asd_uhA
.
Currently, I'm storing this field as a VARCHAR
with a PRIMARY KEY
, and am worried about the memory usage of the index. I'm thinking of splitting this field into two columns, a BIGINT
and a VARCHAR
. Will the PRIMARY KEY
created as a combination of these two columns consume less memory?
Update: Here are the two possibilities.
Please note that I care about inserts and updates. I will have only one huge select to export the data after a long time.
CREATE TABLE without_split (
id varchar(21) CHARSET=ascii COLLATE ascii_bin NOT NULL,
content text NOT NULL
PRIMARY KEY id
);
CREATE TABLE with_split (
id_prefix bigint unsigned NOT NULL,
id_suffix varchar(7) CHARSET=ascii COLLATE ascii_bin NOT NULL,
content text NOT NULL
PRIMARY KEY (id_prefix, id_suffix)
);
Best Answer
MyISAM! Shame. Switch to InnoDB.
Don't worry about that
PRIMARY KEY
; it is too much hassle to split it up, etc. But do make sure it is the 'appropriate' character set:If it is case sensitive, then add on
COLLATE ascii_bin
.Yes, there is a memory and disk penalty. But there are tradeoffs with speed, simplicity, etc.
(If it were 100 characters long, I might give a different answer.)
Numbers for MyISAM
In MyISAM, assuming the strings are half length on average:
PRIMARY KEY(decimal_14_0, varchar_7)
needs 2*(7+(1+3.5)) bytes.PRIMARY KEY(varchar_21)
needs 2*(1+(13.5+3.5))DECIMAL(14,0)
;VARCHAR
;After cancelling out things, the diff is 2*(13.5-7)=13 bytes smaller per row (data+pk) for
DECIMAL+VARCHAR
. 300M*13=4GB.(InnoDB computation is in Comments.)