Mysql – How to write a query to get wine pairing for a pizza based on pairing weight

greatest-n-per-groupMySQLmysql-8.0

I am working on a query to pair drinks to pizzas.

Currently I am able to show the pizza_name, beverage and the pairing_weight being the amount of toppings on a pizza that the beverage pairs with. From here, I am looking to just output a mapping of distinct pizza name to beverage.

I am unsure how to format the query to just return the top result per pizza based on pairing_weight.

mysql> select pizza_name, beverage, COUNT(beverage) as pairing_weight
    -> from pizza_production
    ->     JOIN pizza_restrictions pr on
    ->         pizza_production.topping_name = pr.topping_name
    ->     JOIN beverage_pairing_notes on
    ->         restriction = pairing
    ->     GROUP BY pizza_name, beverage
    ->     ORDER BY pizza_name, pairing_weight DESC;
+--------------+--------------------+----------------+
| pizza_name   | beverage           | pairing_weight |
+--------------+--------------------+----------------+
| Grand Padano | Prosecco           |              2 |
| Grand Padano | Champagne          |              2 |
| Grand Padano | Riesling           |              2 |
| Grand Padano | Cava               |              2 |
| Grand Padano | Pinot Noir         |              1 |
| Grand Padano | Carlsberg beer     |              1 |
| Grand Padano | Zinfandel Rosé     |              1 |
| Grand Padano | Chardonnay         |              1 |
| Grand Padano | Chenin Blanc       |              1 |
| Grand Padano | Gewürztraminer     |              1 |
| Grand Padano | Guinness beer      |              1 |
| Grand Padano | Pinot Grigio       |              1 |
| Grand Padano | Heineken beer      |              1 |
| Grand Padano | Irn Bru            |              1 |
| Grand Padano | Malbec             |              1 |
| Grand Padano | Muscat Blanc       |              1 |
| new york     | Riesling           |              2 |
| new york     | Champagne          |              2 |
| new york     | Prosecco           |              2 |
| new york     | Pinot Noir         |              2 |
| new york     | Malbec             |              2 |
| new york     | Syrah              |              1 |
| new york     | Cava               |              1 |
| new york     | Cabernet Sauvignon |              1 |
| new york     | Chardonnay         |              1 |
| new york     | Carlsberg beer     |              1 |
| new york     | Chenin Blanc       |              1 |
| new york     | Gewürztraminer     |              1 |
| new york     | Guinness beer      |              1 |
| new york     | Muscat Blanc       |              1 |
| new york     | Heineken beer      |              1 |
| pepperoni    | Pinot Noir         |              3 |
| pepperoni    | Malbec             |              3 |
| pepperoni    | Muscat Blanc       |              2 |
| pepperoni    | Heineken beer      |              2 |
| pepperoni    | Guinness beer      |              2 |
| pepperoni    | Gewürztraminer     |              2 |
| pepperoni    | Chenin Blanc       |              2 |
| pepperoni    | Riesling           |              2 |
| pepperoni    | Carlsberg beer     |              2 |
| pepperoni    | Cabernet Sauvignon |              2 |
| pepperoni    | Chardonnay         |              1 |
| pepperoni    | Champagne          |              1 |
| pepperoni    | Prosecco           |              1 |
| pepperoni    | Cava               |              1 |
| pepperoni    | Rioja              |              1 |
| pepperoni    | Syrah              |              1 |
| pepperoni    | Zinfandel Rosé     |              1 |
| vegetarian   | Diet Coke          |              2 |
| vegetarian   | Prosecco           |              1 |
| vegetarian   | Champagne          |              1 |
| vegetarian   | Riesling           |              1 |
+--------------+--------------------+----------------+
52 rows in set (0.01 sec)

An example of desired output would be

+--------------+--------------------+
| pizza_name   | beverage           |
+--------------+--------------------+
| pepperoni    | Pinot Noir         |
| new york     | Riesling           |
| Grand Padano | Prosecco           |
| vegetarian   | Diet Coke          |
+--------------+--------------------+

Is there a way to pull this data from what I already have in the output of the query above?

Version:

mysql> status -------------- /usr/local/mysql/bin/mysql  Ver 8.0.23 for macos10.15 on x86_64 (MySQL Community Server - GPL)

If the pairing_weight is the same for multiple beverages per pizza then a choice of any of them is fine, it can be totally random if needed or just the first one the query displays with the highest weight.

Best Answer

As a classic Greatest N per group problem, this is easily solvable using the ROW_NUMBER() function:

SELECT
  pizza_name, beverage
FROM
  (
    select
        pizza_name, beverage,
        ROW_NUMBER() OVER (
          PARTITION BY pizza_name
          ORDER BY COUNT(beverage) DESC
        ) as pizza_beverage_ranking
    from pizza_production
        JOIN pizza_restrictions pr on
            pizza_production.topping_name = pr.topping_name
        JOIN beverage_pairing_notes on
            restriction = pairing
        GROUP BY pizza_name, beverage
  ) AS derived
WHERE
  pizza_beverage_ranking = 1
;

As specified in the ORDER BY clause of the ROW_NUMBER() function, the rankings are assigned in the descending order of COUNT(beverage). Beyond that, however, the order is arbitrary, and when two or more rows have the same weight, which one gets lower or higher ranking is going to be up to the server. You can add more criteria after COUNT(beverage) DESC if you want the order to be more specific and, consequently, the results more predictable. Apart from sorting, there's also the PARTITION BY clause, which additionally specifies that the enumeration be done separately for every group of rows sharing the same pizza_name value.

Once the row numbers are assigned, you can filter on them, as you can see done in the outer SELECT's WHERE clause.

As a side note, please consider always to qualify columns with table names or aliases (like pizza_production.pizza_name instead of pizza_name, for example) where the query involves more than one table. While the server can certainly resolve unqualified columns if there are no conflicts, it is easier for a human to read and understand a query if the columns are unambiguously qualified, particularly when the reader is not very familiar with the schema. This is as true for queries that you offer for study to an unknown audience, as it is for queries you have once written yourself and forgotten but have to go back to for some reason or other.