Create a DELETE statement that deletes Manufacturers with less than 10 sales

database-designsqlitesubquery

Orders:

Order number (Primary Key)
Date of order
Customer ID (Primary Key for Customers table)
Serial Number (Primary Key for Machines table)

Customers:

ID (Primary Key)
Name
Phone Number
Street (ie. 505 Ramapo Valley Road)
City (ie. Mahwah)
State
Zip

Machines:

Serial Number (Primary Key)
Model Number (Primary Key for Models table)

Model:

ID (Primary key)
Speed
RAM
HD
Price
Manufacturer ID (Primary key for Manufacturer table)

Manufacturer:

ID (Primary Key)
Name
Phone Number
Email Address

Above are tables with different attributes.

I have a lab due soon and one of the tasks is to create an SQLite statement to DELETE Manufacturers with less than ten sales. I have tried to write a sub query, but am not able to formulate a statement that works. The following is what I have so far:
(I know I am off by a lot)

DELETE FROM Manufacturer    
HAVING COUNT(*) < 10
WHERE ID IN
(SELECT ManufacturerID FROM Model WHERE Manufacturer
    WHERE Manufacturer.ID = Model.ManufacturerID

SELECT ModelNo FROM Machines
    WHERE Machines.ModelNo = Model.ID

SELECT SerialNo FROM Orders
    ON Orders.SerialNo = Machines.SerialNo

Best Answer

So you're not too far off, you're just missing a couple key things, the first being the JOIN keyword. JOIN is how you can combine two related datasets. There are different types of joins, one being an INNER JOIN which is how you can combine two datasets and only return the rows that match the join condition. For example, if you wanted to get only Customers who have an Order and you want to know the Name of that Customer and the SerialNumber of their Order you could do so with an INNER JOIN like so:

SELECT Customers.Name, Orders.SerialNumber
FROM Customers
INNER JOIN Orders
    ON Customers.ID = Orders.CustomerID

You can join as many datasets as you like in a single query. So if we also wanted to know the ModelNumber for the above Customers' Orders, we could modify the above query like so:

SELECT Customers.Name, Orders.SerialNumber, Machines.ModelNumber
FROM Customers
INNER JOIN Orders
    ON Customers.ID = Orders.CustomerID
INNER JOIN Machines
    ON Orders.SerialNumber = Machines.SerialNumber

Deriving from the above example we can do the same to get a list of all Manufacturers' Orders, by chaining together a bunch of INNER JOIN clauses. Lets only return the ID of the Manufacturer as well, since that's all we care about right now. Note since this is a list of all Manufacturers' Orders, there will be one row for every Order, so you should see the same ID of the Manufacturer repeated in cases where they have more than one order:

SELECT Manufacturer.ID
FROM Manufacturer
INNER JOIN Model
    ON Manufacturer.ID = Model.ManufacturerID
INNER JOIN Machines
    ON Model.ID = Machines.ModelNumber
INNER JOIN Orders
    ON Machines.SerialNumber = Orders.SerialNumber

Now you need a way to count how many Orders (how many rows) there are for each Manufacturer.ID from the previous query. To do that we can use the built in COUNT() function. Because we want to count the number of Orders for a certain criteria (by Manufacturer), we'll also need to use the GROUP BY clause. The GROUP BY clause allows us to roll-up (flatten) all the rows by the criteria we specify into a single unique row per grouping based on the columns we specify. In this case we just want a single row per Manufacturer.ID so that when we use the COUNT() function, it counts the number of Orders within that grouping, like so:

SELECT Manufacturer.ID, COUNT(*) AS OrderCount
FROM Manufacturer
INNER JOIN Model
    ON Manufacturer.ID = Model.ManufacturerID
INNER JOIN Machines
    ON Model.ID = Machines.ModelNumber
INNER JOIN Orders
    ON Machines.SerialNumber = Orders.SerialNumber
GROUP BY Manufacturer.ID

So the above query now gives us the list of Manufacturer IDs and the count of how many Orders each one has. Let's filter it down to only the Manufacturers who have less than 10 Orders. We will use the HAVING clause to accomplish this, which is the equivalent of using a WHERE clause to filter data but can only be used with aggregate functions like COUNT() on grouped data:

SELECT Manufacturer.ID, COUNT(*) AS OrderCount
FROM Manufacturer
INNER JOIN Model
    ON Manufacturer.ID = Model.ManufacturerID
INNER JOIN Machines
    ON Model.ID = Machines.ModelNumber
INNER JOIN Orders
    ON Machines.SerialNumber = Orders.SerialNumber
GROUP BY Manufacturer.ID
HAVING COUNT(*) < 10 -- Filter out everything except Manufacturers with less than 10 Orders

Finally, we want to use a subquery as you mentioned, so we can DELETE the results in our above query from the Manufacturer table like so:

DELETE FROM Manufacturer
WHERE ID IN
(
    SELECT Manufacturer.ID
    FROM Manufacturer
    INNER JOIN Model
        ON Manufacturer.ID = Model.ManufacturerID
    INNER JOIN Machines
        ON Model.ID = Machines.ModelNumber
    INNER JOIN Orders
        ON Machines.SerialNumber = Orders.SerialNumber
    GROUP BY Manufacturer.ID
    HAVING COUNT(*) < 10 -- Filter out everything except Manufacturers with less than 10 Orders
)

Note that when you use a subquery inside of the IN keyword, you can only return one column, and for your case you only care about the ID of the Manufacturers for who needs to be deleted. So I removed the OrderCount column from the subquery above.

There are definitely alternative ways to accomplish the above as well, but to keep the concepts you need to understand to a minimal, this is a pretty good solution. The concepts I hope you took away from this are:

  1. Joins, specifically the INNER JOIN clause, and you should research Outer Joins as well.
  2. Aggregate Functions such as COUNT(), there are many others too such as MIN(), MAX(), and SUM(), etc.
  3. The GROUP BY clause and how grouping works in general.
  4. The HAVING clause and how and when to use it.