I think the biggest challenge will be doing the event type transformations efficiently, so I'm going to focus on that and ignore the schema completely.
My first thought (read: I've never actually tried this) is to use dynamic SQL to build up a SELECT
statement that will do the data transformation such that values end up in their correct columns (i.e., Col0
--> CompanyId
, or whatever). This boils down to a mapping table consisting of the event type, the source column, the target column, and a default value if required.
Once you have the mapping table (or tables, depending on how complex this actually is), you can use it to dynamically build a SQL statement of the form:
SELECT
(
CASE EventType
WHEN 1234 THEN COALESCE(Col0, 'DefaultCategory')
END
) AS Category,
(
CASE EventType
WHEN 1234 THEN COALESCE(Col1, 'DefaultCompany')
WHEN 5000 THEN COALESCE(Col0, 'DefaultCompany')
END
) AS Company,
(
CASE EventType
WHEN 5000 THEN COALESCE(Col1, 'DefaultLocation')
END
) AS Location
FROM RawSourceTable
Build the statement as part of the ETL process. If you need to add a new event type, just add the required column mappings to the mapping table, and they'll be handled the next time the ETL process runs.
Once the source data is in this kind of state, the rest of the import process should be pretty standard.
Best Answer
You are confusing 2 things in your question. Since you state you have duplicate attribute key errors you are using the the age table as a dimension, but since you also want to find averages you are using the age table as a fact table.
When using ages as a dimension you probably don't want every single age on your axis, so what is typically done is divide the ages in buckets and have the buckets stored as strings so you get something like this
You could either use discretization buckets in your dimension or create the buckets manually in your dimension table.
However if you want to average ages too, I suppose you want to see the average across some dimensions (such as average age of people ordering product X) so you need to create a fact table with the keys of the dimensions you want to report across and then define relationships between your fact table and dimension tables
Mixing the 2 concepts (dimensions and facts) in the same table will lead to headaches over time so you should either solve that in your ETL process or using named queries in your dmv in SSAS.