Postgresql – Define Field as a Foreign Key

database-designpostgresqlschema

I am receiving data from an external source in batches that I need to import to my PostgreSQL database. I have 2 main tables: clients and providers. Ideally, they should have a 1:1 relationship, however, this isn't always going to be true with the data I receive. There are cases where a client will be sent in a batch and it's provider counterpart will only be sent in a later batch (or vice versa).

My question is what is the best way to work around this issue? Currently, I have the two tables that are unlinked but have the potential to be if the "keys" are present in both tables but something tells me this isn't the best way to go about things.

Looking forward to hearing your thoughts!

Many thanks

Best Answer

There are two kinds of foreign keys: declared foreign keys and undeclared foreign keys. Most people are familiar with declared foreign keys, but some don't realize that there is such a thing as an undeclared foreign key.

There are several advantages to declaring a foreign key. The big advantage is rejecting invalid data at input time. For interactive data entry, this is very valuable.

In your case, however, it's a detriment. You probably want to store all the data you are given, even if some of it appears to be dirty data. Later, you can go through some kind of cleansing operation where you detect certain kinds of errors, like orphaned references, and flag them for human intervention.

You would use this field in join conditions precisely the way you use foreign keys. This is what is meant by an undeclared foreign key. If there are any orphaned references, they will drop out of the join, and you need to program accordingly.

In a more formal situation, like a data warehouse, incoming data is loaded into a staging table at first, and only added to the warehouse data after cleansing. That's probably overkill in your case.