Database Design – 3rd Party Platform Unique Identifier vs Internal Identifiers

database-designunique-constraint

What is the general practice when it comes to choosing a unique identifier? In my case I have the option to use the eBay unique listing ID (Item ID) or my own internal ID for use in table joins and general identification. The argument could be made that some 3rd party identifiers are non-numeric and I would understand this proposition but in my specific case the identifier is numeric. Thanks for any advice or personal experiences in advance!

Best Answer

To be safe you'll want both. Internally you should create and use your own unique IDs to maintain a clean relational model with FKs between your tables while keeping a reference to the eBay unique ID in an appropriate table. If you need to reference something by the eBay ID you will have it available, but should the eBay unique ID rules change it does not affect the consistency of your existing table relations and architecture.

Related Solutions

Mysql – Duplicate Relationships in Rails/MySQL Healthcare Application

This question if part database schema and part rails. From a Rails perspective, you can create as many aliased relationships as you like so normalization isn't as much of a concern there.

Without seeing the schema and just going based off of what you have written above though, I'd say that Dosage doesn't belong in a table with it's own model.

Dosage is effectively

Amount: 10 Unit Of Measure: mg

Writing things like 10mg into a table and giving them an id doesn't make a ton of sense. Organizing the types of units of measure that you have available or even standardizing by converting everything to milligrams in the database so that you can just do conversions when displaying would probably make the most sense.

Then you eliminate the dosages table completely and simplify to:

therapies
 - id
 - drug_id
 - milligrams

And Prescription becomes:

prescription
 - id
 - therapy_id
 - condition_id

Now, usage is the outlier here because it's where your only real "duplicate" relationship is but everything you need to link drugs to conditions can be inferred from your prescription data, effectively like so:

SELECT DISTINCT prescriptions.condition_id, therapies.drug_id
FROM prescriptions
  INNER JOIN therapies ON prescriptions.therapy_id = therapies.id

You could use a query like the above to create a view called usages that could be used by a read only Usage model in Rails. If performance became a concern you could use that to create a cached usages table and setup a database trigger to update the cache table when a new prescription is added though.

Mysql – Integer ID vs varchar ID with JOIN

So, what is supposed to be the best practice if both are not valid because criteria are balanced?

The criteria aren't likely balanced. They're almost never going to be balanced. The cost of the extra join to look up the surrogate key (that's the MEDIUMINT; a surrogate key is an opaque value, typically implemented as an auto-increment in MySQL, which has no intrinsic meaning outside the database and is ideally invisible to the user) is going to be negligible compared to the large storage, larger indexes (which means fewer rows per index page, which means more RAM for the same performance).

And cascading updates across several tables? Yikes. That means unnecessary I/O, bringing with it more row locks, index locks, gap locks, more potential cases for deadlocks...

No, I don't think the two apparent alternatives are likely to be genuinely balanced. Go with the surrogate key.

There are database purists out there who don't like surrogate keys, and if memory serves me, these may be the same people who spend too much time thinking about about theoretical databases and don't like NULL. Don't listen to those people too extensively, brilliant theoreticians they may be. (Dare I add, "caveat lector").

Best Answer

Related Solutions

Mysql – Duplicate Relationships in Rails/MySQL Healthcare Application

Mysql – Integer ID vs varchar ID with JOIN

Related Question