Mysql – How to create Combination of records (Order does not matter, no repetition allowed) in theSQL tables

MySQL

I've got a table that has hundreds of rows, each row is a recipe with nutritional information, for example:

recipe_table:

id  | calories | protein| carbs | fat

recipe1, 100,    20g,     10g,     2g
recipe2, 110,    10g,     12g,     12g
recipe3, 240,    20g,     1g,      23g
....

I needed to create a new table (recipe_index) that would show every possible combination of every recipe in recipe_table as a set of 3, so it would look something like:

recipe_index:

id1     | id2    | id3    |calories| protein | carbs | fat
recipe1, recipe2, recipe3,   450,     50g,      23g,   37g
....

Basically it allows me to query recipe_index and say "what 3 recipe combinations come to a total value that's between 440 calories and 460 calories"

My current code for doing this works at 3 meals, however I end up with about 450,000 records in recipe_index, I need to do this same thing for 4,5 and 6 meals as well, so I'm calculating billions of records at the end of this. Is there a more efficient way of doing this? Perhaps I need to look into partitioning a table for each range?

My current SQL code:

INSERT INTO recipe_index
SELECT distinct '3' as nummeals, t1.id as id1, t2.id as id2, t3.id as id3, 0 as id4,   
t1.calories_ps+t2.calories_ps+t3.calories_ps as calories,    
t1.protein_ps+t2.protein_ps+t3.protein_ps as  
protein, t1.carbohydrate_ps+t2.carbohydrate_ps+t3.carbohydrate_ps as carbohydrate, 
t1.fat_ps+t2.fat_ps+t3.fat_ps as fat from recipes t1 inner join  recipes t2  on t1.Id <      
t2.Id inner join  recipes t3  on t2.Id < t3.Id WHERE t1.image <> '' AND t2.image <> ''   
AND t3.image <> ''

Best Answer

Like Mike says, this is not something that you should be doing with the database. This is more of an algorithm question.

Your problem looks very similar to the knapsack problem, which is NP-Complete. Fortunately, you are not looking for the set that adds up to exactly some amount of calories, but a range, so it's not NP-Complete. Luckily for you, a few hundred recipes will easily fit in memory, so you can easily write an algorithm for it.

Here's a simple algorithm. I'm sure there's a better way to do it:

orderedRecipes = recipes sorted by calorie asc

def combinations(recipes, lowerBound, upperBound, recipeCount):
    first = recipes[0]
    if first.calorie < upperBound:
        for c in combinations(recipes[1:], lowerBound - first.calorie, upperBound - first.calorie, recipeCount - 1):
            if sum(r.calorie for r in [first] + c) > lowerBound:
                yield [first] + c
        for c in combinations(recipes[1:], lowerBound, upperBound, recipeCount):
            if sum(r.calorie for r in c) > lowerBound:
                yield c

combinationsOf3Between440and460 = list(combinations(orderedRecipes, 440, 460, 3))

The first loop takes the first item in the rest of the list, and the second loop does not. You don't want it to always take the first item, you want both take and no-take.