I need to optimize my MariaDB tables index.
I've read a lot of very interesting things on StackExchange, but I'm not sure to have the answer for my case.
In my case, I have a table named TableA
which contains more than 40 millions records (stats purposes) with these columns :
ID
(PK)A
(INT)B
(INT)C
(INT)D
(INT)
I make different queries on this table, with many of the 4 columns in the WHERE
clause.
For example, I can have these columns in WHERE
clause of the same query :
- A, B and C
- A and B
- A and D
- A, C and D
- B and D
- B, C and D
- C and D
- only A
- only B
- etc.
I can encounter every case.
What would be the best way to index that table which count more than 40 millions records ?
Should I make only one INDEX with all 4 columns ? Or 4 different indexes ?
This table is used everytime in my app, and contains sensitive data…(for stats purposes).
Thank you for your help !
Joffrey
Best Answer
There is no perfect answer. A possibly good answer is...
First, are the columns compared with
= constant
? Or a "range", such as> constant
? For now, I will assume only=
.How selective are the values? True/false values are not; let's ignore them.
Assuming the 'worst' case (always
=
, all are selective), let's take the 6 permutations of two columns:This list will handle all the one- and two-column tests efficiently, and at least help when you use 3 or 4 columns.
If some column is always used in a range, then never put anything after it in the
INDEX
. If, for example,b
is always tested via a range, change the 4th and 5th to(c,b)
and(d,b)
Other tips:
INT
. That allows 4 billion values and takes 4 bytes. Use a smaller datatype. 40M distinct values needsINT UNSIGNED
, but if you have something with only a million distinct values, considerMEDIUMINT UNSIGNED
(3 bytes, 0..16M). Etc. The rationale is that making the table and indexes smaller helps decrease I/O, hence improves speed.id
? Or is some combo of the columns unique and could become the PK, there by eliminationid
?a
andc
, they also testb
, then change(a, c)
to(a, c, b)
.WHERE
clause mentions only that column; live with it. That column, when combined with others, works fine.If you would like to discuss further, please provide some extra clues about the data and queries involved. (Note that I had to do hand-waving about flags and ranges.)