First, from looking at your existing indexes and comparing them to your example query, you are missing an index on just accountid
. The way MySQL handles indexes is left-most, meaning you can have a composite index like this:
KEY `contactaccounts` (`contactid`,`accountid`,`journalcode`) USING BTREE,
and run a query that looks for contactid
and the contactaccounts
index would be a potential index*. However doing a query on accountid
will not utilize the index, because accountid
is not the left-most column.
If you never search for contactid
without an accountid
, I would create the index like this:
DROP INDEX `contactaccounts` ON `tbl_sessions_2012`;
CREATE INDEX `accountscontact` ON `tbl_sessions_2012` (`accountid`,`contactid`,`journalcode`);
Now, analyzing your other indexes, the first of each set is redundant using the left-most rule, and can be dropped in favor of the second:
KEY `contactaccounts` (`contactid`,`accountid`,`journalcode`) USING BTREE
KEY `contactaccountloginmonths` (`contactid`,`accountid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE
KEY `contactaccountcollections` (`contactid`,`accountcollectionid`,`journalcode`) USING BTREE
KEY `contactaccountcollectionloginmonths` (`contactid`,`accountcollectionid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE
KEY `organisationaccounts` (`organisationid`,`accountid`) USING BTREE
KEY `organisationaccountloginmonths` (`organisationid`,`accountid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE
KEY `organisationaccountcollection` (`organisationid`,`accountcollectionid`,`journalcode`) USING BTREE
KEY `organisationaccountcollectionsloginmonths` (`organisationid`,`accountcollectionid`,`journalcode`,`logintypeid`,`actionTime`) USING BTREE
By having the duplicates you are using up a lot of space on indexes.
* I say potential because there is still the possibility that your index will not be used due to the amount of rows MySQL has to scan through. As your account_id IN ()
statement grows, MySQL will determine that it's faster just to do a full table scan regardless of the index.
To elaborate on ypercube's correct comment, the problem is not that there is fragmentation on the VARCHAR
, but in rows that contain variable length data.
If I were to have a table with these values:
+---+-------+
| i | v |
+---+-------+
| 1 | some |
| 2 | small |
| 3 | text |
+---+-------+
And then issue:
UPDATE my_table SET v = 'dinosaur' WHERE i = 2
Then, depending on the storage engine, I might find that there's just no room for the word 'dinosaur' within the existing row location, since rows were written sequentially and compactly to disk. That might mean I would need to change the location of row #2, or place my 'dinosaur' text on an external location, or what have you.
On the other hand, if I were to issue:
UPDATE my_table SET v = 'a'
There would now be a lot of wasted space; I might be interested in reclaiming that space.
This is the fragmentation problem within rows.
Best Answer
I seriously doubt that it ever held true for InnoDB. Even for MyISAM (which really can have
FIXED
length rows) it was rarely true.Roughly speaking, this is the order (most overhead first)
If the data is bigger (as in
CHAR
being padded), the #1 is impacted. The rest pales into insignificance.Note: There are 4 flavors of InnoDB
ROW_FORMAT
.REDUNDANT
, aCHAR(100) CHARACTER SET utf8mb4
always takes 400 bytes.COMPACT
orDYNAMIC
, it takes between 100 and 400 bytes.COMPRESSED
, the padding spaces are highly compressible, but still take some space.Allegedly, the "fixed" size of a
CHAR
(which, is not all that fixed), tends to prevent block splits. But I kinda doubt it.My Rule of Thumb:
CHAR
should almost always be restricted to truly fixed length strings, and most of them are hex or ascii:Note: latin1 works equally well as ascii, the 8.0 default of utf8mb4 does not.
(See also my answer to the first link. It starts "Most of the answers in this thread are 5 years old, written before InnoDB and utf8 were defaults. So, let me start over..")