Postgresql – Calculating available inventory for a perishable good

database-designfunctionspostgresql

Long time reader but first post!

I am designing a database that requires me to be able to calculate the inventory available for a perishable good on any given day. In my case, it is for a farm. Farmers can add inventory updates for any present or future date (today I have 4 available, tomorrow I have 5 more available). Farmers also set how long their inventory lasts before it expires (this is per farm, not per item). Order items inventory purchased (order items) must be a part of an order that has a fulfillment date before the inventory expires. The inventory cannot be purchased for a fulfillment date after its expiration. More inventory cannot be purchased than is available.

Currently, I have a job that runs every night that removes expired inventory (inserts it into inventory_expired), by calculating the total amount of inventory that expires today or previously and subtracts the quantity in orders placed and the inventory that was already expired. I feel like this isn't the right way to go about it. My approach lets me accurately calculate the inventory up until the current day, and in the past accurately, but can't calculate future inventory properly, because the future date doesn't know what inventory has expired.

Maybe I am making this too complicated. Is there a better way to structure what I am trying to do? Or is there a function I can write that would be able to calculate the available inventory from my current design?

Here is what is relevant out of my current tables:


CREATE TABLE organizations (
  id int NOT NULL,
  PRIMARY KEY(id)
);
CREATE TABLE farms (
  id int NOT NULL,
  organization_id int NOT NULL,
  inventory_expiration_days int NOT NULL,
  PRIMARY KEY(id),
  FOREIGN KEY (organization_id) REFERENCES organizations (id)
);


CREATE TABLE inventory (
 id int NOT NULL,
 organization_id int NOT NULL,
 farm_id int NOT NULL,
  primary key(id),
 FOREIGN KEY (organization_id) REFERENCES organizations (id),
 FOREIGN KEY (farm_id) REFERENCES farms (id)
);

CREATE TABLE inventory_updates (
  id int NOT NULL,
  date date NOT NULL,
  inventory_id int NOT NULL,
  quantity_added int,
  quantity_removed int,
  FOREIGN KEY(inventory_id) REFERENCES inventory (id)
);

CREATE TABLE inventory_expired (
  id int NOT NULL,
  inventory_id int NOT NULL,
  date date NOT NULL,
  quantity_expired int NOT NULL,
  FOREIGN KEY(inventory_id) REFERENCES inventory (id)
);

CREATE TABLE orders (
  id int NOT NULL,
  fulfillment_date date NOT NULL,
  seller_org_id int NOT NULL,
  primary key(id),
  FOREIGN KEY(seller_org_id) REFERENCES organizations (id)
);

CREATE TABLE order_item (
 id int NOT NULL,
  order_id int not null,
 inventory_id int NOT NULL,
 quantity int NOT NULL,
 FOREIGN KEY(inventory_id) REFERENCES inventory (id),
  FOREIGN KEY(order_id) REFERENCES orders(id)
);

And here is my current function, which works for calculating inventory for dates up to today but not in the future. I put a hard coded date in (2021-02-21) for now.

SELECT t.inventory_id, 
COALESCE(t.total_quantity, 0) as total_quantity, 
COALESCE(exp.quantity_expired_previously, 0) quantity_previously_expired, 
COALESCE(sold.quantity_sold, 0) as quantity_sold,
COALESCE(t.total_quantity, 0) - COALESCE(exp.quantity_expired_previously, 0) - COALESCE(sold.quantity_sold, 0) as quantity_available
FROM
(SELECT u2.inventory_id as inventory_id, SUM(COALESCE(u2.quantity_added, 0) - coalesce(u2.quantity_removed, 0)) as total_quantity
  FROM inventory_updates u2 
  LEFT JOIN inventory i2 on i2.id = u2.inventory_id
  LEFT JOIN farms f2 on f2.organization_id = i2.organization_id
  WHERE u2.date <= ('2021-02-21'::date)
  GROUP BY u2.inventory_id) t
LEFT JOIN (
  SELECT e.inventory_id, COALESCE(SUM(e.quantity_expired), 0) as quantity_expired_previously
  FROM inventory_expired e
  WHERE e.date <= ('2021-02-21'::date)
  GROUP BY e.inventory_id
) exp on exp.inventory_id = t.inventory_id
LEFT JOIN (
  SELECT i.inventory_id, SUM(i.quantity) as quantity_sold
  FROM order_item i
  LEFT JOIN orders o on o.id = i.order_id
  LEFT JOIN farms f on f.organization_id = o.seller_org_id
  WHERE o.fulfillment_date < ('2021-02-21'::date + (f.inventory_expiration_days * interval '1 day'))
  GROUP BY i.inventory_id
) sold on sold.inventory_id = t.inventory_id

I was wondering if some sort of union on all of the updates that occur to the inventory quantities could work better, but also think I am going to run into problems when I have lots of rows, and any sort of sum over everything is going to take a long time. Basically, I am pretty stuck.

Best Answer

I'm ignoring efficiency concerns for the moment, because I'm not convinced it's a problem. Try this route and see if it's too slow.

Use the FILTER clause on aggregate functions to only include those rows that should be valid as-of the date in question. Here's an example that only takes quantity_added and the expiration time into account:

-- Set up 1 farm with a 10-day expiration
insert into organizations select 1;
insert into farms select 2, 1, 10;
insert into inventory select 3, 1, 2;
-- +100 inventory on feb 1, +50 on feb 2
insert into inventory_updates select 4, '2021-02-01', 3, 100, 0;
insert into inventory_updates select 4, '2021-02-02', 3, 50, 0;
with date_range as (
  select d from generate_series('2021-02-01'::date, '2021-02-15', interval '1 days') d
  )
  select d, inventory_id,
  sum(quantity_added) FILTER (
    where inv_upd.date <= d and inv_upd.date > (
      d- (interval '1 days' * f.inventory_expiration_days)
    )) as running_total
from date_range
cross join inventory_updates inv_upd
join inventory on inv_upd.inventory_id=inventory.id
join farms f on f.id=inventory.farm_id
      group by 1, 2
      order by 1, 2;

Which produces:

| d                        | inventory_id | running_total |
| ------------------------ | ------------ | ------------- |
| 2021-02-01T00:00:00.000Z | 3            | 100           |
| 2021-02-02T00:00:00.000Z | 3            | 150           |
| 2021-02-03T00:00:00.000Z | 3            | 150           |
| 2021-02-04T00:00:00.000Z | 3            | 150           |
| 2021-02-05T00:00:00.000Z | 3            | 150           |
| 2021-02-06T00:00:00.000Z | 3            | 150           |
| 2021-02-07T00:00:00.000Z | 3            | 150           |
| 2021-02-08T00:00:00.000Z | 3            | 150           |
| 2021-02-09T00:00:00.000Z | 3            | 150           |
| 2021-02-10T00:00:00.000Z | 3            | 150           |
| 2021-02-11T00:00:00.000Z | 3            | 50            |
| 2021-02-12T00:00:00.000Z | 3            |               |
| 2021-02-13T00:00:00.000Z | 3            |               |
| 2021-02-14T00:00:00.000Z | 3            |               |
| 2021-02-15T00:00:00.000Z | 3            |               |

To finish this, you'd need to also join the orders table and subtract out those, as well as using something like sum(quantity_added - quantity_removed) FILTER (...) for the running_total expression. And, of course, you'd just interpolate your date, the date_range thing is just to illustrate the running-total behavior.

Related Solutions

Database Design – Supertyping Inventory Transactions

Try making an inventory transaction table. This table will likely consist of all the columns common to your various transaction type records. For some record types, this may be all the columns. Add a transaction type column to differentiate the records.

You can then create an optional relationship to any subtypes which will contain the columns specific to the subtype.

Alternatively, you can add nullable columns for this data. This will require additional constraints to ensure only the right columns are set.

EDIT: If you have null columns, there will usually be the only table. The shipment column would be required for all types having a shipment id (might include returns), and NULL for other types. The alternative would be to have rows in InventoryShipments for shipments, and rows in InventoryReturns for returns. This quickly gets complicated.

If certain (transaction) types have a lot of type specific columns, you could use a hybrid model, with most types entirely in the Inventory table and some types with type specific table. This may occur when you use sub-type tables, as there may be types which only have columns which are common to all types.

Using either approach requires effort to ensure data integrity. Null columns should always be null for columns, and not null when required for the records type. If there are subtype records, there usually must be only one per inventory record.

Sql-server – Calculating Average Excluding Outliers

A very basic example would be to get the AVG and STDEV of the range of numbers and then exclude any that were more than 1 Standard Deviation from that average.

You then take the average of the new range.

This is quite a basic bit of code (don't forget the CAST to a DECIMAL) which you can expand upon to make it more suitable to your needs.

CREATE TABLE #nums (num INT);

INSERT INTO #nums VALUES (1), (2), (1), (2), (7);

WITH AvgStd AS (
  SELECT
    AVG(num) AS avgnum,
    STDEVP(num) AS stdnum
  FROM #nums AS n  
)
SELECT 
  AVG(CAST(num AS DECIMAL(5,2)))
FROM #nums AS n
CROSS JOIN AvgStd
WHERE num < (avgnum + stdnum)
  AND num > (avgnum - stdnum);

DROP TABLE #nums;

Best Answer

Related Solutions

Database Design – Supertyping Inventory Transactions

Sql-server – Calculating Average Excluding Outliers

Related Question