Mysql – Nosql vs sql for genre management

dbmsmongodbMySQL

I'm looking to keep a simple database for movies, with details like Name, YearofRelease, Genre etc. The key factor is the genres entry, where a movie could fall into multiple genres. Using a sql based approach, one way is to have a separate table for each genre and enter to movie id. Using a nosql mongodb approach, the generes can be stored as an array.
From the purpose of both storing and searching by genres, what is the better approach?

Best Answer

The only real questions to mainly be considered when choosing between NoSQL and a Relational Database Management System (RDBMS) are:

  1. "Does the data have a well defined structure or is it very unstructured / variable in structure?"
  2. "Are the questions I need to ask of the data always simple?"
  3. And less important but valid: "Do I need to distribute the data across multi-nodes of servers in a cluster, or having one centralized server is sufficient?"

For #1 it sounds like you have a well structured set of data. In fact, it's a very simple structure and should be very easy to store and maintain in a RDBMS. And if you go with a RDBMS then to support multiple Genres for multiple Movies all you would need is one additional linking table (aka bridge / junction table) called MoviesGenres that would be many-to-many with Movies by storing the MovieId and GenreId, one row per combination. You wouldn't need a "separate table for each genre" to solve this, just the one additional table MoviesGenres. So if MovieABC had Genre1 and Genre4 then there would be two rows in your MoviesGenres table for the records (MovieABC, Genre1) and (MovieABC, Genre4).

For #2, that's going to be up to your use cases. If you ever want to ask more complex questions like "How many movies with 3 or more genres were released after 2020?" or "Which movies have both Genre7 and Genre9, and were released before 2000?", etc, then a RDBMS makes it much easier to query these kinds of things and probably are more efficient at doing so (generally speaking).

For #3, this just depends on the scale of your data and your preference of how to maintain the hardware behind it. If you plan to have many terabytes of data and multi-billions of rows, do you prefer to pay for a multi-node cluster to horizontally scale the data at the tradeoff that it may not always be consistently in sync across every node in the cluster, or would you prefer to centralize all the data to one main server and scale vertically as the data grows. This is kind of a moot point nowadays, since even RDBMS now exist with features to support horizontal scaling too. So unless you foresee storing a very large amount of data upfront, I wouldn't be as concerned with this question as the prior two.

In general, based on what information you have provided, my recommendation would be to use a SQL (RDBMS) solution so you have the flexibility to query it as you need in a performant manner, given the fact you have a well defined and simple structure for your data.