Postgresql – Calculating available inventory for a perishable good

database-designfunctionspostgresql

Long time reader but first post!

I am designing a database that requires me to be able to calculate the inventory available for a perishable good on any given day. In my case, it is for a farm. Farmers can add inventory updates for any present or future date (today I have 4 available, tomorrow I have 5 more available). Farmers also set how long their inventory lasts before it expires (this is per farm, not per item). Order items inventory purchased (order items) must be a part of an order that has a fulfillment date before the inventory expires. The inventory cannot be purchased for a fulfillment date after its expiration. More inventory cannot be purchased than is available.

Currently, I have a job that runs every night that removes expired inventory (inserts it into inventory_expired), by calculating the total amount of inventory that expires today or previously and subtracts the quantity in orders placed and the inventory that was already expired. I feel like this isn't the right way to go about it. My approach lets me accurately calculate the inventory up until the current day, and in the past accurately, but can't calculate future inventory properly, because the future date doesn't know what inventory has expired.

Maybe I am making this too complicated. Is there a better way to structure what I am trying to do? Or is there a function I can write that would be able to calculate the available inventory from my current design?

Here is what is relevant out of my current tables:


CREATE TABLE organizations (
  id int NOT NULL,
  PRIMARY KEY(id)
);
CREATE TABLE farms (
  id int NOT NULL,
  organization_id int NOT NULL,
  inventory_expiration_days int NOT NULL,
  PRIMARY KEY(id),
  FOREIGN KEY (organization_id) REFERENCES organizations (id)
);


CREATE TABLE inventory (
 id int NOT NULL,
 organization_id int NOT NULL,
 farm_id int NOT NULL,
  primary key(id),
 FOREIGN KEY (organization_id) REFERENCES organizations (id),
 FOREIGN KEY (farm_id) REFERENCES farms (id)
);

CREATE TABLE inventory_updates (
  id int NOT NULL,
  date date NOT NULL,
  inventory_id int NOT NULL,
  quantity_added int,
  quantity_removed int,
  FOREIGN KEY(inventory_id) REFERENCES inventory (id)
);

CREATE TABLE inventory_expired (
  id int NOT NULL,
  inventory_id int NOT NULL,
  date date NOT NULL,
  quantity_expired int NOT NULL,
  FOREIGN KEY(inventory_id) REFERENCES inventory (id)
);

CREATE TABLE orders (
  id int NOT NULL,
  fulfillment_date date NOT NULL,
  seller_org_id int NOT NULL,
  primary key(id),
  FOREIGN KEY(seller_org_id) REFERENCES organizations (id)
);

CREATE TABLE order_item (
 id int NOT NULL,
  order_id int not null,
 inventory_id int NOT NULL,
 quantity int NOT NULL,
 FOREIGN KEY(inventory_id) REFERENCES inventory (id),
  FOREIGN KEY(order_id) REFERENCES orders(id)
);

And here is my current function, which works for calculating inventory for dates up to today but not in the future. I put a hard coded date in (2021-02-21) for now.

SELECT t.inventory_id, 
COALESCE(t.total_quantity, 0) as total_quantity, 
COALESCE(exp.quantity_expired_previously, 0) quantity_previously_expired, 
COALESCE(sold.quantity_sold, 0) as quantity_sold,
COALESCE(t.total_quantity, 0) - COALESCE(exp.quantity_expired_previously, 0) - COALESCE(sold.quantity_sold, 0) as quantity_available
FROM
(SELECT u2.inventory_id as inventory_id, SUM(COALESCE(u2.quantity_added, 0) - coalesce(u2.quantity_removed, 0)) as total_quantity
  FROM inventory_updates u2 
  LEFT JOIN inventory i2 on i2.id = u2.inventory_id
  LEFT JOIN farms f2 on f2.organization_id = i2.organization_id
  WHERE u2.date <= ('2021-02-21'::date)
  GROUP BY u2.inventory_id) t
LEFT JOIN (
  SELECT e.inventory_id, COALESCE(SUM(e.quantity_expired), 0) as quantity_expired_previously
  FROM inventory_expired e
  WHERE e.date <= ('2021-02-21'::date)
  GROUP BY e.inventory_id
) exp on exp.inventory_id = t.inventory_id
LEFT JOIN (
  SELECT i.inventory_id, SUM(i.quantity) as quantity_sold
  FROM order_item i
  LEFT JOIN orders o on o.id = i.order_id
  LEFT JOIN farms f on f.organization_id = o.seller_org_id
  WHERE o.fulfillment_date < ('2021-02-21'::date + (f.inventory_expiration_days * interval '1 day'))
  GROUP BY i.inventory_id
) sold on sold.inventory_id = t.inventory_id

I was wondering if some sort of union on all of the updates that occur to the inventory quantities could work better, but also think I am going to run into problems when I have lots of rows, and any sort of sum over everything is going to take a long time. Basically, I am pretty stuck.

Best Answer

I'm ignoring efficiency concerns for the moment, because I'm not convinced it's a problem. Try this route and see if it's too slow.

Use the FILTER clause on aggregate functions to only include those rows that should be valid as-of the date in question. Here's an example that only takes quantity_added and the expiration time into account:

-- Set up 1 farm with a 10-day expiration
insert into organizations select 1;
insert into farms select 2, 1, 10;
insert into inventory select 3, 1, 2;
-- +100 inventory on feb 1, +50 on feb 2
insert into inventory_updates select 4, '2021-02-01', 3, 100, 0;
insert into inventory_updates select 4, '2021-02-02', 3, 50, 0;
with date_range as (
  select d from generate_series('2021-02-01'::date, '2021-02-15', interval '1 days') d
  )
  select d, inventory_id,
  sum(quantity_added) FILTER (
    where inv_upd.date <= d and inv_upd.date > (
      d- (interval '1 days' * f.inventory_expiration_days)
    )) as running_total
from date_range
cross join inventory_updates inv_upd
join inventory on inv_upd.inventory_id=inventory.id
join farms f on f.id=inventory.farm_id
      group by 1, 2
      order by 1, 2;

Which produces:

| d                        | inventory_id | running_total |
| ------------------------ | ------------ | ------------- |
| 2021-02-01T00:00:00.000Z | 3            | 100           |
| 2021-02-02T00:00:00.000Z | 3            | 150           |
| 2021-02-03T00:00:00.000Z | 3            | 150           |
| 2021-02-04T00:00:00.000Z | 3            | 150           |
| 2021-02-05T00:00:00.000Z | 3            | 150           |
| 2021-02-06T00:00:00.000Z | 3            | 150           |
| 2021-02-07T00:00:00.000Z | 3            | 150           |
| 2021-02-08T00:00:00.000Z | 3            | 150           |
| 2021-02-09T00:00:00.000Z | 3            | 150           |
| 2021-02-10T00:00:00.000Z | 3            | 150           |
| 2021-02-11T00:00:00.000Z | 3            | 50            |
| 2021-02-12T00:00:00.000Z | 3            |               |
| 2021-02-13T00:00:00.000Z | 3            |               |
| 2021-02-14T00:00:00.000Z | 3            |               |
| 2021-02-15T00:00:00.000Z | 3            |               |

To finish this, you'd need to also join the orders table and subtract out those, as well as using something like sum(quantity_added - quantity_removed) FILTER (...) for the running_total expression. And, of course, you'd just interpolate your date, the date_range thing is just to illustrate the running-total behavior.