Postgresql – Distribute rows into groups while minimizing difference

postgresql

I have a table of users and values

|---------------------|------------------|
|         User        |       Value      |
|---------------------|------------------|
|          1          |        34        |
|---------------------|------------------|
|          2          |        120       |
|---------------------|------------------|
|          3          |        6         |
|---------------------|------------------|

and I would like to distribute these rows into two groups while minimizing the difference in SUM(Value) between each group. The number of users between the two groups should differ by at most one if initial user count is odd, or be of equal length if the number of initial users is even.

|---------------------|------------------|------------------|
|        Group        |       Users      |    SUM(Value)    |
|---------------------|------------------|------------------|
|          1          |       1,3        |        40        |
|---------------------|------------------|------------------|
|          2          |        2         |       120        |
|---------------------|------------------|------------------|

This should work for any number of users greater than 2. Is there something built-in to Postgres to help with this?

Best Answer

Only if you count SQL itself :) The DB does not have any way of solving this type of thing other than the naive way, but the naive way of generating all possible distributions of rows to groups, filtering to only those that satisfy the num_users_in_longest<=(num_users_in_shortest+1) requirement, and then calculating the delta for each distribution and using a LIMIT 1 to filter to the smallest one can be expressed in SQL without too much awkwardness. Here's a fiddle that does what you want. Again, this is doing the underlying work in the naive, inefficient way, and this won't scale to large datasets. For that, you'll need to find some clever algorithm to do this, and probably do it in a procedural language that gives you the flow control needed to implement it.