Database Design – Should a FOREIGN KEY Column Be NULLable?

database-designerdforeign keyschema

I've been developing some database structure to cover the following business rules:

  • An invoice has many invoice_items.
  • An invoice may have many refunds
  • A refund has many invoice_items.
  • All items should be from the same invoice.
  • A refund should always have items.

So I came up with the entity-relationship diagram (ERD) shown bellow:

ERD 1

I'd like to see which invoice_items are included in the refund.

However, in this case, it would not be possible to add a FOREIGN KEY (FK) (InvoiceItems => refunds). The reason is that an invoice may or may not have refunds.

So I modified the diagram as follows:

ERD 2

Here I could add the FK (refund_invoice_items => invoice_items)

Questions

I think this can work around the FK problem, so: Do you think this is an appropriate approach or have any ideas about it?

The main question is that if I could store NULL marks in the FK column, the first example would be better? (In my case I always add NOT NULL constraint to columns defined as FKs)

Best Answer

To me, your first diagram is good enough (in physical model perspective) with details like the following:

[invoice table]
invoice_id primary key
....

[refund table]
refund_id primary key,
invoice_id NOT NULL, -- FK
....

[invoice_item table]
invoice_item_id primary key,
invoice_id NOT null, -- FK
refund_id NULL  -- FK
....  

The data generation flow is

  1. An Invoice is created in [invoice table]
  2. One or more Invoice_items are created in [invoice_item table] referring to the invoice_id in step 1.(note invoice_id canNOT be NULL as defined in [invoice_item table], while refund_id can be null)
  3. When a refund record is generated, an associated invoice_id will be inside the record, and one or more invoice_item records (belonging to that invoice_id) will consequently be updated for their refund_id column.

However, for a pure normalized data model, your second model is more robust (note, your current model diagram has a typo, the relationship between REFUNDS and REFUNDS_INVOICE_ITEMS should be 1:N not N:1). The major advantage of model 2 is that it can meet your integrity requirement of (while model 1 cannot meet):

All (refund) items should be from the same invoice.

note: I added (refund)

But model 1 is better for real world performance (fewer joins), and I often would sacrifice space for performance. In this case, model 1 will have lots of invoice_items with refund_id NULL, while in model 2, you do not have such concerns.

So at the end of the day, you need to balance your concerns between performance/convenience and strict business requirements. In Model 2, your integrity is guaranteed by database itself and you need to guarantee your integrity at application level in model 1.