What Is the Next Best Ingredient to Add

database-designrecursive

I am building a program that allows users to optimize their grocery shopping so they can make the most recipes using the fewest ingredients.

One of the features of this program is a function I’m calling the “NEXT BEST INGREDIENT” or NBI. For example, if you already have salt, oil, and rosemary, what NBI will unlock the most additional recipes?

Let’s say the answer is beef. If you buy beef, you’ll be able to create more new recipes than if you bought any other ingredient.

After buying beef, what is the NEXT BEST INGREDIENT one should add after that? And so on.

The feature will allow a user to enter any number of starting ingredients (including zero). So someone could start with 0, 3, or even 50 ingredients. What should they add next?

I’m using a database of recipes (about 500K recipes in all) to compile results.

I can map out the individuals steps for writing out a specific algorithm. But I need help writing a more generic algorithm.

Here’s a specific algorithm:

A user enters in 3 ingredients A, B, and C

  1. Isolate all 4-ingredient recipes that use use A+B+C + BLANK. Note the BLANK with highest frequency.

  2. Isolate all 3-ingredient recipes that use A+B + BLANK… A+C + BLANK… B+C + BLANK. Note the BLANK with highest frequency.

  3. Isolate all 2-ingredient recipes that use A + BLANK….B + BLANK…C+ BLANK. Note the BLANK with highest frequency.

  4. Isolate all 1-ingredient recipes that simply use BLANK. Note the BLANK with highest frequency.

  5. Calculate the BLANK with the highest TOTAL frequency. This is the NEXT BEST INGREDIENT.

But I need a way to write a more generic algorithm for when a user enters N number of ingredients (anything from 0 ingredients to 100).

I can write out the rules in plain English – but my programmer needs a way to write a generic rule using coding logic.

Here it is in plain English.

When user searches N ingredients (including zero), instantly ignore any recipes that require N+2 ingredients or more. And then isolate ALL remaining recipes that use ALL, SOME, or NONE of those ingredients such that there is 1 (and only 1) vacant slot for an “unsearched” ingredient.
Tally up the frequency of all “unsearched” ingredients. Whichever “unsearched” ingredient has the highest frequency becomes the NEXT BEST INGREDIENT.

The goal is to make this search feature accurate (or as accurate as my database is) and fast. A few ingredients is easy. But when looking at 10+ ingredients, it may slow down a bit.

Any thoughts?

Best Answer

If you assume a 3NF design in the entity relationship diagram below:

Entity Relationship Diagram with 4 tables

Recipes are stored in a Recipe table, a recipe's ingredients are stored in RecipeIngredients table, and the MyIngredients table contains an entry for each ingredient on hand.

Your Next Best Ingredient(s) is found with the following query:

SELECT Ingredients.IngredientID 
  , Ingredients.IngredientName
  , count() RecipesCompleted
FROM Ingredients 
 INNER JOIN RecipeIngredients on Ingredients.IngredientID = RecipeIngredients.IngredientID
 INNER JOIN (
   SELECT RecipeID
     , count() RemainingIngredients
   FROM Recipes
     INNER JOIN RecipeIngredients on Recipes.RecipeID = RecipeIngredients.IngredientID
     LEFT JOIN MyIngredients on RecipeIngredients.IngredientID = MyIngredients.IngredientID
   WHERE MyIngredients.IngredientID is null
   GROUP BY RecipeID
   HAVING count(*) = 1) RecipesWOneIngLeft 
      on RecipeIngredients.RecipeID = RecipesWOneIngLeft.RecipeID
 LEFT JOIN MyIngredients on RecipIngredients.IngredientID = MyIngredients.IngredientID
WHERE MyIngredients.IngredientID is null
GROUP BY Ingredients.IngredientID
 , Ingredients.IngredientName
ORDER BY RecipesCompleted DESC 

Let's break the query down a bit. It works using the following theories:

  1. By inserting your ingredients into the MyIngredients table allows, you're allowed to have N ingredients.
  2. The next step is to use your algorithm to identify all recipes which are missing just one ingredient. This is done in the RecipesWOneIngLeft sub-query
  3. The final step counts the number of recipes which are completed by each ingredient and then orders the result so the NBI is first.

Consider rewording your plain english algorithm to the following:


When user searches N ingredients, isolate ALL remaining recipes such that there is 1 (and only 1) vacant slot for an “unsearched” ingredient. Tally up the frequency of all “unsearched” ingredients. Whichever “unsearched” ingredient has the highest frequency becomes the NEXT BEST INGREDIENT.