MySQL Create Index Semantics: Clustered vs. Nonclustered

clusteringindexMySQL

I'm new to databases and specifically MySQL, so pardon me if this is a simple question.

I'm trying to make sense of what happens with MySQL when I call CREATE INDEX. I know that primary keys are unique and correspond to a clustered index but what happens if I have a table where I have no primary key defined and create an index. I know by default the data structure is a B-tree but it's not clear if this corresponds to a clustered or nonclustered index in the typical sense.

Specifically I have a table that that has four columns as integers:

sales(item_id, time, price, location_id)

The only column that has unique values is time. If I create this table and then later run CREATE INDEX dx ON sales(item_id); what happens?

From here I'm lead to believe that when I create the table without the primary key, because time is unique that is the clustered index and so I'm creating a non-clustered index on item_id.

My questions are: is this correct? If so how can I create a clustered index on a non-unique column in MySQL?

Best Answer

(This assumes you are using `ENGINE=InnoDB.)

Your table must have a primary key: You have 3 choices:

  1. (preferred) You explicitly provide such.
  2. There is a UNIQUE index with non-null column(s). (Sloppy, just make it the PK)
  3. (not a good option) A hidden PK will be provided.

The PK will be clustered and unique. That is the only choice for a pk in MySQL.

No other index can be clustered. Again, that is by-definition.

There are 3 choices for structure of an index:

  • (the most common) B+Tree. (Everything else in this answer is B+Tree)
  • FULLTEXT -- for searching for words in text
  • SPATIAL -- for 2-dimensional searches, such as geographic

sales(item_id, time, price, location_id)

Perhaps you need PRIMARY KEY(item_id, time). But that assumes you will never have two sales for the same item at exactly the same time.

So, it might be safer to have a 5th column:

id INT UNSIGNED AUTO_INCREMENT NOT NULL,
PRIMARY KEY(id)

Let's see the SELECTs; from them we can decide what secondary index(es) you need. Or do it yourself here .

My questions are: is this correct? If so how can I create a clustered index on a non-unique column in MySQL?

That is a trick that very few people have discovered. Once you have something that is unique (such as the 5th column, above), do this instead of the PK above:

PRIMARY KEY(time, id),
INDEX(id)

Now, even it time is not unique, the PK is clustered and unique because of id tacked on. AUTO_INCREMENT does not require more than "being the first column in some index".