Sql-server – What’s a good relational structure for units and complex unit conversions

relational-theoryschemasql server

My company is in the Energy industry and I need to come up with a good way to represent the conversion of units of measurement. I've done some searching and have yet to find a good article covering this at the depth I need. Most info published about unit conversions assume that given Unit 1 there is a known (hard-coded) conversion rate to get to Unit 2 and it's simple math (this being the most complex example I've found which still doesn't help). However, this isn't always true in the real world and certainly isn't true for what we must handle. (Sorry for the long write-up – I'm trying to provide as much info as possible!)

Tricky example 1:
Some conversions vary with time, such as converting $5 to Euros or vice versa. This sounds like it has nothing to do with the energy but it really does in the energy commodities market (think stock market).

Tricky example 2:
(Oversimplified) Some natural gas burns hotter than others. In addition, natural gas can be measured/stored either based on the Energy in the gas (such as Therms) OR based on the Volume of said gas (such as MCF which is 1000 Cubic Feet), and there are other possibilities too (such as Ton for Mass). An analogy of gasoline is that 1 gallon of 87-octane Unleaded provides less energy than 1 gallon of 93-octane Unleaded.

Tricky example 3:
In addition to having these units of measurement, we also often have to deal with rates, such as $ per Therm or € per MCF. So we need some way to work with these rates and how they relate to the base units so if we need to convert from $ per Therm to € per MCF, we can and it utilizes the same published rates as converting from Therm to MCF.

Tricky example 4:
Previously, I've used the term Energy very loosely and possibly sometimes incorrectly. At this point and from now on, that's changing. So the last curveball is that we deal with both Energy as well as Power. With electricity, this means kWH versus kW (a fairly good explanation despite it being Yahoo Answers). A data analogy: it would be like comparing total MB of data downloaded versus your Mbps bandwidth that your ISP provides you. Like data, energy takes time to be delivered. Continuing on with the data analogy, we might have to calculate the average effective bandwidth consumed over a period of time, so given that 60MB was downloaded over 1 minute, the "effective" rate would be 60*8/60 = 8Mbps. The "trick" here is that if we stored Mbps as a unit itself, we need some way to relate it directly to a MB as well even though it also involves a time component. FORTUNATELY, converting from Energy to Power (or vice versa) is a fairly rare thing for us to have to do, so our solution should be optimized for all of the other tricky examples and hopefully allow for this one as well, but not handling relating Energy to Power is an option.

Tricky example 5:
This is essentially 3 + 4. We may have both $ per KW as well as $ per KWh, so the rates deal with both Power and Energy.

Easy example:
Some conversions are very easy and these are the ones that most info on the web can handle. 1000 Wh = 1kWh and such. Same thing with Therms and Decatherms or kW to MW, etc. I don't need help here but keep in mind that ~70% of our conversions will be of this type.


My thoughts on how to start but unsure on how to finish:

  1. This is clearly VERY messy so I am proposing that we choose a standard unit of measurement to store all data in for each commodity and "usage type". So for electricity, our standard energy unit would be kWH and our standard power unit would be kW. So to convert to any other energy/power units, we would only need a conversion rate to/from our standard and not every possible combination. If we ever need to convert from MW to W, we can always do it by way of converting to/from kW.
  2. Since conversion rates may be dependent upon a specific time, we must allow the ability for this time to be stored in relation to the measurement. I suspect we need not worry about these conversion rates changing any faster than once an hour and we might even be able to assume once a day.
  3. Since conversion rates may be dependent upon published values, we must allow for the ability for this value to be stored in relation to the measurement. I suspect we need not worry about these conversion rates changing any faster than once an hour and we might even be able to assume once a day.
  4. After this is all figured out, I anticipate creating a web service that does nothing more than handling all unit conversions. I'm NOT looking for SQL to perform these conversions and I can do some creative caching to make it so I'm not absolutely hammering these tables but it will need to sometimes handle converting ~400 values per page load in the user-accessed website. I'm not sure if/how this matters.

I don't know at what level I should store the conversion rates that never change versus the conversion rates that do change, and exactly how to key off of them in a way that will allow me to quickly access these in a way that's easy to work with.

Any thoughts on how to tackle this or even some published reading material that might be able to help? I'm using SQL Server (soon to be SQL Azure) but this shouldn't really matter. The schema to properly represent this is what I'm having trouble with here. If it were as simple as inches versus centimeters, it's easy. But the varying conversion rates are the problem here.

Best Answer

There are a few things you want to factor into your design:

1. Measurements Need a Timestamp

Make sure all of your measurements have an indication of:

  • Scalar Value
  • Unit of Measurement
  • Date and Time the Measurement was Taken

This will allow you to work with measurements that need time-dependent conversion calculations.

2. Units of Measure Have Attributes

Each unit of measure has a few different attributes. The obvious ones are indicative, like a code and maybe a descriptive name. There are also a couple of critical other attributes to keep for each unit of measure. (i) Unit Type and (ii) Conversion Factor to the Base Unit.

The first tells you whether your unit of measure is a length, a weight, energy, power, currency, etc. etc. It should also tell you what the base unit of measure is. You should pick exactly one for each unit type. You can use things like kWh if you like, but I'd stick to the base SI units (as applicable) if I were you.

The second tells you what your unit of measure needs to be multiplied by to get it to the base. I mentioned that this is an attribute of your UOM, but in fact it needs to be in a child table. The business key of the child table that holds this base conversion factor is the combination of the UOM, its base unit type and a date/time. I would keep both an effective and an expiry date/time on the base conversion factor table. This allows you to quickly find the right rate that applies at any particular point in time. If it happens to be a rate that doesn't change, that's OK. Just use a min-collating effective date and a max-collating expiry date for the one record.

3. Trying to Table-Drive Everything Will Make You Nuts The last piece of the puzzle is determining the calculation for moving from one kind of unit to another kind of unit. You could try to table-drive this kind of calculation but in the end the tricky ones are going to make the design so general (read complicated and slow) that it will be impractical. Instead, create a code-table of conversion calculations and use it to link one kind of Unit Type to another kind of Unit Type. Perform the actual calculations in some code somewhere. Which piece of code you use for any given conversion is what the code table tells you. How the calculation is performed is just in the code. You can have one calculation each for the various easy things, like area needs two lengths and volume needs three lengths as well as the harder ones like work needs energy and time.

When you get the details of your design figured out you should blog it and come back here to post a link!