Background
I'm working on a website for a movie theather chain currently located in four different cities (might expand in the future). They use the same, single-database website for all cities, which means I have to have a column in certain tables which holds the ID of the city that each row belongs to.
Right now I have three different tables:
Cinemas
– Contains each city's cinema (ID and name).Movies
– Contains all movies that has been/will be shown at the cinema.Showtimes
– Contains all showtimes for all movies in all cities.
The structure of the Showtimes
table is the following:
Column Name | Column Type | Description
--------------+--------------+---------------
ID | BIGINT | (Primary) Unique ID for each showtime (perhaps unnecessary?)
CinemaID | TINYINT | Foreign key bound to Cinemas.ID
MovieID | BIGINT | Foreign key bound to Movies.ID
Showtime | DATETIME | At what date and time the movie will show
(will contain multiple rows for each movie, i.e. one row for each showtime)
How this table will be used
A user of the website must be able to:
-
View all current/upcoming movies and showtimes (sorted by date) in the selected city.
Example query (backend):
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? ORDER BY Showtime
-
Select a single movie and view all showtimes for that specific title only (in the selected city).
Example query:
SELECT Showtime FROM Showtimes WHERE CinemaID = ? AND MovieID = ? ORDER BY Showtime
-
Select a single day and view all movies and showtimes for that day only (in the selected city).
Example query:
SELECT MovieID, Showtime FROM Showtimes WHERE CinemaID = ? AND (Showtime BETWEEN [date 12:00 AM] AND [date 12:00 PM])
So naturally I decided that I needed to create indexes for the columns.
Problem
What I'm having trouble with is deciding/determining how to index the columns properly. One index for each column seems quite expensive[1] [2] so I started looking into composite indexes, which seems to be the right choice but also led to even more confusion.
From my understanding (based on what I've read) you should add the columns to the index by order of selectivity, making the most selective (I'm guessing that means the most unique/with the most cardinality?) column the first in the composite index[3] (in my case that would be the Showtime
column). The only problem with that is that the index can only be used by the database if the first column is included in the search query[4] [5], which it currently isn't in either of my queries.
Question
What kind of index(es) should I apply to my columns in order to cover all usage scenarios? (the last scenario may be omitted, but the first two are required)
Should I use a composite index on all columns, for some columns, or do I need a separate index for each column?
This table is updated at most a few times per week to add new showtimes.
Footnotes
1 MySQL indexes – what are the best practices?
2 Indexing every column in a table
3 How important is the order of columns in indexes? (question)
4 How important is the order of columns in indexes? (#2 top-voted answer)
Best Answer
Composite Primary Key
I would define the primary key as a composite key of
(CinemaID, MovieID, Showtime)
.These 3 columns uniquely identify each row, and so having a separate
ID
column is not necessary.Composite (Secondary) Index
With this PK, the only additional index you will need for your queries is
(CinemaID, Showtime)
.Why these indexes?
A good way to think about indexes are used is to think of them as the order of columns in a spreadsheet.
Imagine a spreadsheed with
(CinemaID, MovieID, Showtime)
that is sorted by each column successively.All your queries have
CinemaID
present, which means you can quickly find the "section" of the spreadsheet for thatCinemaID
. Then, for your query that searches byMovieID
, you can easily find the "subsection" in the 2nd column, whereMovieID
matches the searched for value.As the 3rd column of
Showtime
is also sorted, you can imagine how quickly and easily it is to find all those show times for that movie, in that cinema. The DBMS does things in a similar way and can retrieve those results extremely quickly.As for your other queries, they all start with
CinemaID
and then useShowtime
in some manner. They also need theMovieID
in their results.So, the
(CinemaID, Showtime)
index has you covered there. Again, theCinemaID
easily finds the "section" of the spreadsheet (in the analogy), and all possible showtimes (and there are going to be duplicates, assuming there is more than one screen) will be listed in order and are easily searched and/or sorted by those values.Even better, since your primary key includes
MovieID
, that column is included for all secondary indexes after the defined columns (at least for MySQL InnoDB - others engines too, but not necessarily all.)Think of that is a 3rd column in the "spreadsheet" of our secondary index. The reason the column exists is to have all parts of the primary key available to do a lookup to the main table (aka the clustered index, in InnoDB), if necessary. In this simple case, no lookup is needed, so it is even more efficient as it doesn't require that double lookup.
Using just this primary key and a single secondary index, you should obtain excellent performance on any of the queries you listed.
Afterthought
My assumption that this uniquely identifies each row may NOT be correct if you have a movie showing on multiple screens at the same time. If you want to be able to separately identify those screens, then my solution is not best (I can provide another solution for that situation, please just let me know.)