“Measure Type Dimensions” in “Accumulating Snapshot” fact table

data-warehousedatabase-design

I have an accumulating snapshot fact table that tracks the entry and exit of containers in a terminal.

The containers can enter and exit in 3 different ways, so I thought to create a specific dimension table that lists these 3 possible ways (train, vessel or truck).

Then I read this article which basically says that this technique is wrong, but I can't understand why.

First article:

Sometimes when a fact table has a long list of facts that is sparsely populated in any individual row, it is tempting to create a measure
type dimension that collapses the fact table row down to a single
generic fact identified by the measure type dimension. We generally do
not recommend this approach. Although it removes all the empty fact
columns, it multiplies the size of the fact table by the average
number of occupied columns in each row, and it makes intra-column
computations much more difficult. This technique is acceptable when
the number of potential facts is extreme (in the hundreds), but less
than a handful would be applicable to any given fact table row.

I understand that if a "Measure Type Dimension" is implemented for a transaction fact table it can create problems like this other article says, but I can't see any downside if used for an accumulating snapshot fact.

Second article: (some downsides of implementing a "Measure Type Dimension")

  1. […] If we go with a "Measure Type Dimension" we will lose this analytic ability. If one measure is not compatible with the other
    measures, we can’t add them up.
  2. […] The more number of passes our SQL need to run to produce a report, the slower the report.
  3. […] On the BI tool, if you don’t put the measure type filter, you are risking the user getting “rubbish information”. From usability
    point of view, this design is a rubbish.

Response to Mark Storey-Smith's answer

Very nice approach, i would have never thought about that.

Another thing: every entry and exit of a vehicle that brings container into the terminal has a unique ID which gives me other informations like: expected arrival of the vehicle, actual arrive, if it's a vessel the dock, if it's a truck the tollbooth and many other informations…

These are 3 different fact tables and they must be linked somehow to the container fact table.

I thought that the ID of the voyage is a degenerate dimension, so it would directly go into the container fact table. So, my doubt is: should i add 6 different fields in the container fact table (vessel_voyage_in_key, vessel_voyage_out_key, train_voyage_in_key, train_voyage_out_key, truck_voyage_in_key, truck_voyage_out_key) or just 2 other fields (voyage_in, voyage_out) that dynamically links to the various voyage tables?

I hope my doubt is clear, thank you.

Best Answer

I believe the guidance is referring to a wide fact table where the majority of measure values are null:

CREATE TABLE dbo.SparseFact
(
    Dim1Key     INT NOT NULL
    , Dim2Key   INT NOT NULL
    , Dim3Key   INT NOT NULL
    , Dim4Key   INT NOT NULL
    , Dim5Key   INT NOT NULL
    , Value1    INT NULL
    , Value2    INT NULL
    , Value3    INT NULL
    , Value4    INT NULL
    , Value5    INT NULL
    , Value6    INT NULL
    , Value7    INT NULL
    , Value8    INT NULL
    ..
    , Value101  INT NULL
    , Value102  INT NULL
    , Value103  INT NULL
);

The suggestion is that some people will see all the nulls and decide to do this instead:

CREATE TABLE dbo.DontDoThisFact
(
    Dim1Key             INT NOT NULL
    , Dim2Key           INT NOT NULL
    , Dim3Key           INT NOT NULL
    , Dim4Key           INT NOT NULL
    , Dim5Key           INT NOT NULL
    , MeasureTypeKey    INT NOT NULL
    , Value             INT NOT NULL
);

Not good.

In your scenario I think I'd be looking at something like this, which is very different to the scenario described in the articles you referenced.

CREATE TABLE dbo.InventoryFact
(
    ContainerKey        INT NOT NULL
    , TransportTypeKey  TINYINT NOT NULL
    , EntryDateTime     DATETIME NULL
    , ExitDateTime      DATETIME NULL
);

CREATE TABLE dbo.TransportType
(
    TransportTypeKey    TINYINT IDENTITY(1,1) NOT NULL
    , EntryTransport    CHAR(10) NOT NULL
    , ExitTransport     CHAR(10) NOT NULL
);

INSERT
    dbo.TransportType
SELECT
    EntryTransport
    , ExitTransport
FROM
    (
    SELECT EntryTransport = 'Train'
    UNION
    SELECT EntryTransport = 'Truck'
    UNION
    SELECT EntryTransport = 'Vessel'
    UNION
    SELECT EntryTransport = 'N/A'
    UNION
    SELECT EntryTransport = 'Unknown'
    ) en
CROSS JOIN
    (
    SELECT ExitTransport = 'Train'
    UNION
    SELECT ExitTransport = 'Truck'
    UNION
    SELECT ExitTransport = 'Vessel'
    UNION
    SELECT ExitTransport = 'N/A'
    UNION
    SELECT ExitTransport = 'Unknown'
    ) ex;

To the additional questions...

I would add ExpectedEntryDate, ExpectedExitDate to the Container/InventoryFact. Less certain, without visibility of all the data elements, I would probably put EntryVoyageId and ExitVoyageId in a separate junk dimension together as one row along with any other degenerate data items (identifiers for the truck, train etc).

I would add 3 new dimensions for VesselVoyage, TruckVoyage and TrainVoyage and 6 Voyage keys (inbound/outbound) to this one fact (it's 6 new keys, not 6 additional rows). You then have the option of placing Dock and Tollbooth in the appropriate Voyage dimension. If you keep the generic data in these dimensions (VesselFlag, TruckCapacity) and the specific in a junk dimension (VesselName, VesselMMSI) they won't explode in size.