Better DB design to have a SUM column, or to constantly add up all relationship items per query

database-design

I have a database with an invoices table and invoice_items table. For simplicity, let's just say theinvoices table has the following structure:

| id | name | created_at |

The invoice_items has this structure:

| id | invoice_id | tital | created_at |

Pretty simple, a parent invoice table and the line items go into invoice_items with a relationship setup.

What I'm wondering is, how effective it is to has a sum column on the invoices table? Would it be bad to have all invoice_items per invoice added up on the fly to show the invoice total amount? Or would it be better to have a SUM column which is static, and is pulled automatically when invoices are listed – then updated specifically whenever an invoice changes?

Just trying to weigh up the pro's and con's – because whenever an invoice changes, the SUM column needs to be updated whereas if it was just always calculated based on the line items dynamically, would that be a drastic design flaw or be acceptable?

Best Answer

As ever - it depends.

If you have a blisteringly quick DBMS or only a handful of items in each invoice, then totalling them up on the fly would be the cleanest (and, therefore, "best") option.

The "Ivory-Tower Database Purists" would say that you should never store anything that you can derive dynamically (citing Date-of-Birth vs Age). Realistically, it depends on just how long it takes to do that derivation!

If you have an "ordinary" database, or hundreds or even thousands of items in each invoice and you need that summarised total frequently, then maintaining the summary field at the Invoice level might be the better solution. [Per-row] Triggers are the way to achieve this, so that no update to any item gets missed.

All that said, if I were in your position, I'd go with on-the-fly totalling ... at least until the application's performance began to suffer, and then I'd think about "optimising" this aspect of it.