MySQL vs. MariaDB – Many Indexes vs. One Multi-Column Index

indexmariadbMySQLstatisticswhere

I need to optimize my MariaDB tables index.
I've read a lot of very interesting things on StackExchange, but I'm not sure to have the answer for my case.

In my case, I have a table named TableA which contains more than 40 millions records (stats purposes) with these columns :

  • ID (PK)
  • A (INT)
  • B (INT)
  • C (INT)
  • D (INT)

I make different queries on this table, with many of the 4 columns in the WHERE clause.
For example, I can have these columns in WHERE clause of the same query :

  • A, B and C
  • A and B
  • A and D
  • A, C and D
  • B and D
  • B, C and D
  • C and D
  • only A
  • only B
  • etc.

I can encounter every case.
What would be the best way to index that table which count more than 40 millions records ?
Should I make only one INDEX with all 4 columns ? Or 4 different indexes ?
This table is used everytime in my app, and contains sensitive data…(for stats purposes).

Thank you for your help !

Joffrey

Best Answer

There is no perfect answer. A possibly good answer is...

First, are the columns compared with = constant? Or a "range", such as > constant? For now, I will assume only =.

How selective are the values? True/false values are not; let's ignore them.

Assuming the 'worst' case (always =, all are selective), let's take the 6 permutations of two columns:

INDEX(a, b) -- for (a), (a,b), (a,b,c), (a,b,d), (a,b,c,d)
INDEX(a, c)
INDEX(a, d)
INDEX(b, c)
INDEX(b, d)
INDEX(c, d)

This list will handle all the one- and two-column tests efficiently, and at least help when you use 3 or 4 columns.

If some column is always used in a range, then never put anything after it in the INDEX. If, for example, b is always tested via a range, change the 4th and 5th to (c,b) and (d,b)

Other tips:

  • You say INT. That allows 4 billion values and takes 4 bytes. Use a smaller datatype. 40M distinct values needs INT UNSIGNED, but if you have something with only a million distinct values, consider MEDIUMINT UNSIGNED (3 bytes, 0..16M). Etc. The rationale is that making the table and indexes smaller helps decrease I/O, hence improves speed.
  • Do you need id? Or is some combo of the columns unique and could become the PK, there by elimination id?
  • If there are common patterns, extend some of the indexes. For example, if usually when users test a and c, they also test b, then change (a, c) to (a, c, b).
  • Use InnoDB.
  • If any column is "low cardinality", then no index will be used when the WHERE clause mentions only that column; live with it. That column, when combined with others, works fine.
  • Do not over-normalize. I hope that a,b,c,d are not normalizations of floats or datetimes. Note how this hooks into my comments about "ranges".

If you would like to discuss further, please provide some extra clues about the data and queries involved. (Note that I had to do hand-waving about flags and ranges.)