Database Design – Should a FOREIGN KEY Column Be NULLable?

database-designerdforeign keyschema

I've been developing some database structure to cover the following business rules:

An invoice has many invoice_items.
An invoice may have many refunds
A refund has many invoice_items.
All items should be from the same invoice.
A refund should always have items.

So I came up with the entity-relationship diagram (ERD) shown bellow:

ERD 1

I'd like to see which invoice_items are included in the refund.

However, in this case, it would not be possible to add a FOREIGN KEY (FK) (InvoiceItems => refunds). The reason is that an invoice may or may not have refunds.

So I modified the diagram as follows:

ERD 2

Here I could add the FK (refund_invoice_items => invoice_items)

Questions

I think this can work around the FK problem, so: Do you think this is an appropriate approach or have any ideas about it?

The main question is that if I could store NULL marks in the FK column, the first example would be better? (In my case I always add NOT NULL constraint to columns defined as FKs)

Best Answer

To me, your first diagram is good enough (in physical model perspective) with details like the following:

[invoice table]
invoice_id primary key
....

[refund table]
refund_id primary key,
invoice_id NOT NULL, -- FK
....

[invoice_item table]
invoice_item_id primary key,
invoice_id NOT null, -- FK
refund_id NULL  -- FK
....

The data generation flow is

An Invoice is created in [invoice table]
One or more Invoice_items are created in [invoice_item table] referring to the invoice_id in step 1.(note invoice_id canNOT be NULL as defined in [invoice_item table], while refund_id can be null)
When a refund record is generated, an associated invoice_id will be inside the record, and one or more invoice_item records (belonging to that invoice_id) will consequently be updated for their refund_id column.

However, for a pure normalized data model, your second model is more robust (note, your current model diagram has a typo, the relationship between REFUNDS and REFUNDS_INVOICE_ITEMS should be 1:N not N:1). The major advantage of model 2 is that it can meet your integrity requirement of (while model 1 cannot meet):

All (refund) items should be from the same invoice.

note: I added (refund)

But model 1 is better for real world performance (fewer joins), and I often would sacrifice space for performance. In this case, model 1 will have lots of invoice_items with refund_id NULL, while in model 2, you do not have such concerns.

So at the end of the day, you need to balance your concerns between performance/convenience and strict business requirements. In Model 2, your integrity is guaranteed by database itself and you need to guarantee your integrity at application level in model 1.

Questions

Best Answer

Related Solutions

Ms-access – Should I maintain relationships diagrams in the back-end and the front-end

ERD for a web development company

Related Question