Mysql – Which columns should be indexed when all may be used in different search queries

database-designindexinnodbmariadbMySQL

Background

I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.

Right now I have three different tables:

  • Cinemas – Contains each city's cinema (ID and name).
  • Movies – Contains all movies that has been/will be shown at the cinema.
  • Showtimes – Contains all showtimes for all movies in all cities.

The structure of the Showtimes table is the following:

Column Name   | Column Type  | Description
--------------+--------------+---------------
ID            | BIGINT       | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID      | TINYINT      | Foreign key bound to Cinemas.ID
MovieID       | BIGINT       | Foreign key bound to Movies.ID
Showtime      | DATETIME     | At what date and time the movie will show 

(will contain multiple rows for each movie, i.e. one row for each showtime)

How this table will be used

A user of the website must be able to:

  • View all current/upcoming movies and showtimes (sorted by date) in the selected city.

    Example query (backend):

    SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime
    
  • Select a single movie and view all showtimes for that specific title only (in the selected city).

    Example query:

    SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime
    
  • Select a single day and view all movies and showtimes for that day only (in the selected city).

    Example query:

    SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])
    

So naturally I decided that I needed to create indexes for the columns.

Problem

What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems quite expensive[1] [2] so I started looking into composite indexes, which seems to be the right choice but also led to even more confusion.

From my understanding (based on what I've read) you should add the columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?) column the first in the composite index[3] (in my case that would be the Showtime column). The only problem with that is that the index can only be used by the database if the first column is included in the search query[4] [5], which it currently isn't in either of my queries.

Question

What kind of index(es) should I apply to my columns in order to cover all usage scenarios? (the last scenario may be omitted, but the first two are required)

Should I use a composite index on all columns, for some columns, or do I need a separate index for each column?

This table is updated at most a few times per week to add new showtimes.

Footnotes

1 MySQL indexes – what are the best practices?

2 Indexing every column in a table

3 How important is the order of columns in indexes? (question)

4 How important is the order of columns in indexes? (#2 top-voted answer)

5 When should I use a composite index?

Best Answer

Composite Primary Key

I would define the primary key as a composite key of (CinemaID, MovieID, Showtime).

These 3 columns uniquely identify each row, and so having a separate ID column is not necessary.

Composite (Secondary) Index

With this PK, the only additional index you will need for your queries is (CinemaID, Showtime).

Why these indexes?

A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.

Imagine a spreadsheed with (CinemaID, MovieID, Showtime) that is sorted by each column successively.

All your queries have CinemaID present, which means you can quickly find the "section" of the spreadsheet for that CinemaID. Then, for your query that searches by MovieID, you can easily find the "subsection" in the 2nd column, where MovieID matches the searched for value.

As the 3rd column of Showtime is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.

As for your other queries, they all start with CinemaID and then use Showtime in some manner. They also need the MovieID in their results.

So, the (CinemaID, Showtime) index has you covered there. Again, the CinemaID easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.

Even better, since your primary key includes MovieID, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)

Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.

Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.

Afterthought

My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)