How to store data in look-up tables when some of it may be semantically different, but logically the same

database-designtable

I am working on a system that receives communications from various laboratories. Each laboratory has two separate, independently operated data services which means that Laboratory A and Laboratory B could send the same data to the system, but they may send it in different formats (if they use different systems). I'm trying to figure out the best way to handle this. Here's an example:

The system I am working on receives patient appointment status information, specimen collection information and final laboratory test results.

When Laboratory A sends appointment status information or specimen collection information, it comes from one system. Part of the information they send might be XML such as:

<CollectionStatus>INPRC</CollectionStatus>
<ReasonForTest>PRE</ReasonForTest>
<IdSampleType>UR</IdSampleType>

This indicates to our system that the Urine (UR) specimen collection is in process (INPRC) and the reason for that test is "Pre-Employment" (PRE).

Later on when Laboratory A sends the final laboratory test results for the same test they may send XML such as:

<CollectionStatus>OK</CollectionStatus>
<ReasonForTest>PRE-EMPLOYMENT</ReasonForTest>
<IdSampleType>R</IdSampleType>

This indicates to our system that the Urine (R) specimen was collected successfully (OK) and that the reason for the test is "Pre-Employment" (PRE-EMPLOYMENT).

The same lab may send the same information multiple times, but because they come from different systems at the same laboratory, the data sent may be different.

The same applies to different laboratories. Laboratory B may send:

<CollectionStatus>2</CollectionStatus>
<ReasonForTest>PE</ReasonForTest>
<IdSampleType>U</IdSampleType>

and

<CollectionStatus>1</CollectionStatus>
<ReasonForTest>1</ReasonForTest>
<IdSampleType>UR</IdSampleType>

To handle this, right now I have three tables for each differing variable. Using ReasonForTest as an example I would have these tables:

ReasonForTest table schema

The table "TestReasons" holds data that is standard for our system. Such as (for example):

| TestReasonID | Reason                         |
-------------------------------------------------
|            1 | Pre-Employment                 |
|            2 | Post-Accident                  |
|            3 | Return to Duty                 |
|            4 | Promotion                      |

The other tables, "Results.TestReasons" and "Orders.TestReasons" are simply there to allow the system to convert the data it receives from laboratories into the data our system understands and needs to operate and if necessary, convert it back. The "Name" field holds text that is "UX friendly" and is typically shown on reports or our system's front facing UX. The "Value" field holds the data that the laboratory identified in the "LaboratoryID" field would send (such as "PRE" or "PRE-EMPLOYMENT"). Since not all laboratories may test for the same reasons (the above examples are some typical drug testing reasons), it's possible that the "Results.TestReasons" and "Orders.TestReasons" may not have a record that matches every entry in the "TestReasons" table.

Since there are (at the moment at least) 13 different data types that may be sent at any given time, that ends up being 39 different tables to simply convert from one data type to another. I don't know why, but this just sits funny with me and I feel like there's probably a better way to do this.

Am I missing something or does this seem like the best way to achieve this?

Best Answer

Am I missing something or does this seem like the best way to achieve this?

Item 1

You seem to be missing the part of the contract with the laboratories that state "you will send us data in this format.". That document should clearly list the valid values for TestReason (even if it is their list of values).

Design Idea 1

receives patient appointment status information, specimen collection information and final laboratory test results.

I'm pretty sure there are ISO/Government standards for accomplishing this. CDISC comes to mind.

I would look at various Government web sites to see if they have a standardized lists of Sample Types. The Open Data StackExchange might be able to help you out (or even state that this does not exist).

Design Idea 2

Natural Keys.

In both your tables (Order.TestReasons and Results.TestReasons) should have a Unique Constraint on (LaboratoryId, Value). These can be your PK. You may want to investigate why they are two separate set of values.

Design Idea 3

You should already have a TestReasons dimension table (and a table for each of the other 12 dimensions). That would narrow your requirement down to 26 tables. Then, you just need to reduce the two tables into one. (eg add a "Domain" column that indicates "Orders" vs "Results")