Storing external databases data into one

database-design

I need to store different datasets coming from a provider where each dataset has it's own release path.
These datasets can be combined together to get the full picture of the data available from the provider.

I know from the doc of the publisher the combination of versions that are allowed.
My pain point is that I need to keep track of the version for each dataset.

Example of data:

"Main" dataset from publisher "ABC" has version "1.0".
"Ext" dataset from publisher "ABC" has version "release_3"

My schema is as follow:

Provider
ProviderId

Dataset
DatasetId
ProviderId

Version
VersionId
DatasetId

Main
MainId
VersionId

Ext
ExtId
MainId
VersionId

Based on that, is it a problem that FK "VersionId" in tables "Main" and "Ext" references a different record in table "Version" ?

I'm afraid that any user querying the DB will not expect to have diverging Version (as the FK name is identical in both tables).
Unfortunately that's the reality of the data provided by the publisher.

Is there a better design to accomplish the same result ?

NB: It is possible that in the future, datasets from different providers need to be combined.

Thanks

Best Answer

Technically your Version.DatasetId gets you back to the Dataset table and then you can determine the Provider by ProviderId. So while the Version table can be a little confusing by VersionId, as long as there's no overlapping VersionIds in Ext and Main then you should be ok. Otherwise you'll need to add the DatesetId to Ext and Main too.

From a design perspective for other developers and users, you can create a view that pre-joins the Ext and Main to Version and provide that view to them instead. You can even add a column called IsMain to the view to be extra explicit if you feel.