I've been developing some database structure to cover the following business rules:
- An invoice has many invoice_items.
- An invoice may have many refunds
- A refund has many invoice_items.
- All items should be from the same invoice.
- A refund should always have items.
So I came up with the entity-relationship diagram (ERD) shown bellow:
I'd like to see which invoice_items are included in the refund.
However, in this case, it would not be possible to add a FOREIGN KEY (FK) (InvoiceItems => refunds
). The reason is that an invoice may or may not have refunds.
So I modified the diagram as follows:
Here I could add the FK (refund_invoice_items
=> invoice_items
)
Questions
I think this can work around the FK problem, so: Do you think this is an appropriate approach or have any ideas about it?
The main question is that if I could store NULL marks in the FK column, the first example would be better? (In my case I always add NOT NULL constraint to columns defined as FKs)
Best Answer
To me, your first diagram is good enough (in physical model perspective) with details like the following:
The data generation flow is
However, for a pure normalized data model, your second model is more robust (note, your current model diagram has a typo, the relationship between REFUNDS and REFUNDS_INVOICE_ITEMS should be 1:N not N:1). The major advantage of model 2 is that it can meet your integrity requirement of (while model 1 cannot meet):
note: I added (refund)
But model 1 is better for real world performance (fewer joins), and I often would sacrifice space for performance. In this case, model 1 will have lots of invoice_items with refund_id NULL, while in model 2, you do not have such concerns.
So at the end of the day, you need to balance your concerns between performance/convenience and strict business requirements. In Model 2, your integrity is guaranteed by database itself and you need to guarantee your integrity at application level in model 1.