Postgresql – Recommending movies customers haven’t rented yet based on their favorite genres, according to their rental history

postgresql

Disclaimer

I do NOT want any explicit queries to my question. I want suggestions regarding how I can best approach this problem. I am eager to learn SQL by experimentation, but for my sake, I'd prefer not to have my hand held. Thanks!

Database/ER model

I am using the DVD Rental database found on the PostgreSQLTutorial website. Here's the ER diagram they've provided (the zip is provided on the site):

enter image description here

Preliminary problem leading up to question

For my own practice, I wanted to see if I could come up with a list of customers (customer) and the genres (film_category) from which they've rented (payment and rental) movies, as well as counts for each of those genres per customer. Here is the query I wrote, which works correctly:

SELECT customer.first_name || ' ' || customer.last_name AS customer, category.name AS movie_genre, COUNT(category.name) AS num_rented

FROM customer JOIN payment ON customer.customer_id = payment.customer_id
JOIN rental ON payment.rental_id = rental.rental_id
JOIN inventory ON rental.inventory_id = inventory.inventory_id
JOIN film_category ON inventory.film_id = film_category.film_id
JOIN category ON film_category.category_id = category.category_id

GROUP BY customer, movie_genre
ORDER BY customer, num_rented DESC;

Problem

To challenge myself further, I want to see if I can list each customer's name along with the names of movies they have not yet rented that belong to the genre from which they've purchased the most movies (or the first one, if there's a tie). For example, here's a snapshot of some of the data I get when running the above query:

enter image description here

And here's a preview of the first 19 rows of the film table (not joined in the above query, but it will need to be joined for this problem):

enter image description here

So for example, we see that Aaron Selby really likes Travel movies, so we want to list her name alongside all movies belonging to the Travel category, but only those movies she has not yet rented.

Question

Could someone please offer some guidance/hints as to where I can begin? I know I'll need to join the film table, but here are some challenges:

  • limiting the output for each customer to only the most common genre they've rented

  • based on that information, retrieving all movies from films corresponding to that genre that the customer has not yet rented

Best Answer

Most common ...

In statistics, this is called mode.

Yes, PostgreSQL has an aggregate called mode. I haven't messed with it in PostgreSQL.

Retrieves those not wanted...

You want the resulting set of all movies for a genre minus the data set containing all movies rented by a customer for that genre

In PostgreSQL, you'll want to say EXCEPT instead of MINUS.