The current pandemic is making it very difficult to contact my teachers for assistance, so I'm hoping I can describe my problem well enough to try getting some help here.
The database used is the sakila example database
(I tried to create a fiddle, but the tables are just too big)
My data will be grouped by each store (only 2 video stores), and I will be getting
- The total sales of each store (teachers hint: store table has inventory. inventory can be rented, rental table has payments.
- count of customers per store
- inventory count per store
- count of movies per store
- count of movies rented that are not yet returned
Here is my incorrect output:
--------------------------------------------------------------------------------------------------------------------------------
| Jordan_Rasmussen | store_id | total_sales | num_customers | count_inventory | count_titles | inventory_cost | num_rentals_out |
--------------------------------------------------------------------------------------------------------------------------------
| (date&time) | 1 | 68359569.18 | 326 | 2270 | 759 | 952923.30 | 29992 |
----------------------------------------------------------------------------------------------------------------------------
| (date&time) | 2 | 56966647.92 | 273 | 2311 | 762 | 970134.69 | 24843 | |
--------------------------------------------------------------------------------------------------------------------------------
Here is the correct output:
--------------------------------------------------------------------------------------------------------------------------------
| First_and_last N | store_id | total_sales | num_customers | count_inventory | count_titles | inventory_cost | num_rentals_out |
--------------------------------------------------------------------------------------------------------------------------------
| (date&time) | 1 | 209691.93 | 326 | 2270 | 759 | 46205.30 | 92 |
--------------------------------------------------------------------------------------------------------------------------------
| (date&time) | 2 | 208669.04 | 273 | 2311 | 762 | 46415.89 | 91 |
--------------------------------------------------------------------------------------------------------------------------------
Here is my code for trying to get the above output:
SELECT NOW() AS 'Jordan_Rasmussen',
s.store_id,
-- Get the total sales
SUM(p.amount) AS total_sales,
-- Get the number of customers
COUNT(DISTINCT c.customer_id) AS num_customers,
-- Get the inventory count
COUNT(DISTINCT i.inventory_id) AS inventory_count,
-- Get the number of movie titles
COUNT(DISTINCT f.title) AS num_titles,
-- Get the inventory value
SUM(DISTINCT f.replacement_cost) * COUNT(DISTINCT i.inventory_id) AS inventory_value,
-- Get the number of movies rented that have not yet been returned
COUNT(r.rental_date) AS num_rentals_out
FROM store AS s
LEFT JOIN inventory AS i ON s.store_id = i.store_id
LEFT JOIN customer AS c ON s.store_id = c.store_id
INNER JOIN rental AS r ON i.inventory_id = r.inventory_id
INNER JOIN payment AS p ON r.rental_id = p.rental_id
INNER JOIN film AS f ON i.film_id = f.film_id
GROUP BY store_id;
I can get the correct results by themselves, but I notice the more I JOIN tables together, the more the results change. I've been scratching my head about this for awhile now, but I'm just not sure what it is I'm missing.
Should I be using subqueries? Or am I just joining improperly?
Sorry for such a big question, but I'm clueless right now.
Best Answer
The problem is you are joining unrelated tables. If you don't join them, as in
you get:
Total sales and num_rentals_out differ from your correct output. I wonder who is right. The Sakila structure diagram, if I'm not mistaken, has it's crow's feet inverted.
The same result can be obtained, while querying inventory only once, using a CTE, the Common Table Expression "WITH", as follows:
Querying the store table can also be dropped, since all we need is the store_id, which we also find in inventory.