I'm looking to keep a simple database for movies, with details like Name, YearofRelease, Genre etc. The key factor is the genres entry, where a movie could fall into multiple genres. Using a sql based approach, one way is to have a separate table for each genre and enter to movie id. Using a nosql mongodb approach, the generes can be stored as an array.
From the purpose of both storing and searching by genres, what is the better approach?
Mysql – Nosql vs sql for genre management
dbmsmongodbMySQL
Best Answer
The only real questions to mainly be considered when choosing between NoSQL and a Relational Database Management System (RDBMS) are:
For #1 it sounds like you have a well structured set of data. In fact, it's a very simple structure and should be very easy to store and maintain in a RDBMS. And if you go with a RDBMS then to support multiple
Genres
for multipleMovies
all you would need is one additional linking table (aka bridge / junction table) calledMoviesGenres
that would be many-to-many withMovies
by storing theMovieId
andGenreId
, one row per combination. You wouldn't need a "separate table for each genre" to solve this, just the one additional tableMoviesGenres
. So ifMovieABC
hadGenre1
andGenre4
then there would be two rows in yourMoviesGenres
table for the records(MovieABC, Genre1)
and(MovieABC, Genre4)
.For #2, that's going to be up to your use cases. If you ever want to ask more complex questions like "How many movies with 3 or more genres were released after 2020?" or "Which movies have both Genre7 and Genre9, and were released before 2000?", etc, then a RDBMS makes it much easier to query these kinds of things and probably are more efficient at doing so (generally speaking).
For #3, this just depends on the scale of your data and your preference of how to maintain the hardware behind it. If you plan to have many terabytes of data and multi-billions of rows, do you prefer to pay for a multi-node cluster to horizontally scale the data at the tradeoff that it may not always be consistently in sync across every node in the cluster, or would you prefer to centralize all the data to one main server and scale vertically as the data grows. This is kind of a moot point nowadays, since even RDBMS now exist with features to support horizontal scaling too. So unless you foresee storing a very large amount of data upfront, I wouldn't be as concerned with this question as the prior two.
In general, based on what information you have provided, my recommendation would be to use a SQL (RDBMS) solution so you have the flexibility to query it as you need in a performant manner, given the fact you have a well defined and simple structure for your data.