I know the column order in a compound index matters if you want to query by a subset of the indexed-columns, but if you're query specifies values for all the indexed columns, is there any performance benefit to having a high-cardinality column earlier than a low-cardinality column? I vaguely remember reading something that suggested that was the case because it narrows down the result set quicker, but I can't find anything to back that up now.
I'm using MySQL w/ InnoDB. InnoDB uses clustered indexes, which might be relevant to my question, but I think it only does so for primary keys, which my index is not. The table looks something like this:
CREATE TABLE `my_table` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`ref_a_id` int(10) unsigned NOT NULL,
`ref_b_id` int(10) unsigned NOT NULL,
`is_active` tinyint(1) DEFAULT '1',
PRIMARY KEY (`id`),
UNIQUE KEY `index_my_table_on_ref_a_ref_b_is_active` (`ref_a_id`, `ref_b_id`, `is_active`)
) ENGINE=InnoDB AUTO_INCREMENT=2818259 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
In relation to my question, imagine ref_a_id
has a higher cardinality than ref_b_id
.
Best Answer
It is an old wives' tale that cardinality matters.
INDEX(a,b)
andINDEX(b,a)
perform virtually identically when in a BTree. The depth of the BTree is the same.What does matter is that both columns are searched with
=
, as inWHERE a=12 AND b=45
.The
PRIMARY KEY
in InnoDB is a BTree, and so is each secondary index. The only difference is what else is in the leaf nodes. The PK's leaf nodes contains all columns; a secondary index's leaf nodes contains the columns of the PK.In InnoDB (not in some competing products), the PK is, by definition, clustered and
UNIQUE
.As for your table,...
Are you allowing to rows for each ref_a--ref_b combo? One 'active', one not? This seems unlikely.
Why have
id
at all? Why not promote theUNIQUE
key to be the PK?Is
is_active
everNULL
?See my tips on many:many mapping tables.