How to design a relational DB to capture the same entity as reported by multiple sources

database-design

I need a DB that can capture instruments as reported by a bank and two accounting firms. Let's take Walmart stock as an example. The bank may have a instrument named "Wal-Mart Stock". One of the accounting firms may have it named as "Walmart Stock". The other accounting firm may have "Wal-mart common stock".

Here's the diagram that I have so far:

ER

* Table contains five additional columns. These five column names are the same for MasterInstrument and DataSourceInstrment.

The DataSource represents the bank, accounting firms, or any other entity that can provide data.

The DataSourceInstrument represents the instrument data as provided by the DataSource.

MasterInstrument is a version of the instrument specific to the application itself. So, in our example, it may contain "Walmart Common Stock" for the instrument name. It's different than DataSourceInstrument b/c unlike DataSourceInstrument, it is used extensively throughout the application and it must maintain a history of changes.

The Instrument table provides a way to know that several different data sources are referring to the same instrument. DataSourceInstrument and MasterInstrument with the same instrumentId are referring to the same instrument. (Instrument.id is a surrogate key b/c some instruments do not have a natural key, and the DataSourceInstruments and a MasterInstrument may need to be manually mapped as referring to the same instrument.) So, using the Walmart example, we can know that the bank, two accounting firms and the internal system are referring to the same Walmart instrument.

The Issuer represents the issuer of the instrument such as "Wal-Mart Stores, Inc." Note: It only needs to be captured for MasterInstruments.

I have a couple of doubts on this design:

  1. Is it poor design to have a table (i.e. Instrument) with only a single surrogate column? If this is a problem, how would it be designed differently?
  2. Are the various *Instrument table names confusing? If so, how could it be more clear?

Overall how should the design account for the need to capture multiple sources referring to the same instrument along with the need to capture an internal version of that instrument with history?

Update

The system's primary function / raison d'ĂȘtre is reporting. It must report for a given date what the internal instrument (i.e. MasterInstrument) data was on that date. This is why I have MasterInstrument as a history /temporal table and do not have the columns in Instrument.

Best Answer

Is it poor design to have a table (i.e. Instrument) with only a single surrogate column? If this is a problem, how would it be designed differently?

If that is the only constant property of the entity then that is fine, but I would assume that there are other invariant properties such as a name ("Walmart" in your example). Even if the same "instrument" can be referred to by different names by each source and issuer (those names then belong in the relation table) do they not have a canonical name?

Are the various *Instrument table names confusing? If so, how could it be more clear?

It depends a lot on how well you document them and/or the names and naming conventions in your target industry.

Table names should be as descriptive as possible (you can always shorten them in queries using aliases) avoiding generic words where possible. Where a generic name is needed because nothing more speciifc isn't too specific then I would suggest the simplest generic word that fits (item, object, thing) rather than instrument, but that is personal preference.

For junction tables (relation tables, or whatever you call them as there are several common names for the same concept) I recommend the name represents the row entity types in the relationship - so IssuerInstrument rather than MasterInstrument in your diagram.