Does surrogate key assignment for a fact table require that the source data has natural keys

data-warehousedimensionetlfacttablessis

Let's say I have a simple OLTP database with orders, products, and customers:

enter image description here

And from it, I am building a data mart with an orders fact table, product dimension, customer dimension, and date dimension:

enter image description here

When loading the orders table into fact_orders (let's say I was using an SSIS Lookup Transformation to assign the surrogate keys), does that mean that the source of data for the orders would also need to have the natural "foreign key" values that were associated with the order in the OLTP system?

In other words, would the data that's being loaded come from a query like this?

SELECT 
  order_date,                   -- needed to get date surrogate key
  customer_name,                -- needed to get customer surrogate key
  product_name,                 -- needed to get product surrogate key
  order_number,                 -- denegenerate dimension,
  qty_ordered AS order_qty,     -- measure
  total_amount AS order_amount  -- measure
FROM orders o 
  INNER JOIN customers c 
    ON o.customer_id = c.customer_id
  INNER JOIN products p 
    ON o.product_id = p.product_id 

Best Answer

I'm not sure that you need "natural" keys, but you probably do need to maintain a key mapping of sorts. So you need to understand what relationships map between your source and target systems, identify the keys for those relationships and build your key mappings from there.

I had a previous question on this called "What is the best practice for mapping from natural keys to integer-based keys? (ETL)".

EDIT: So far I am seeing at least three, if not four mappings.

CustomersToDim_Customers (customer_id, dim_customer_id)
ProductsToDim_Products (product_id, dim_product_id)
OrderDatesToDim_Date (order_date, date_id) or (map_id,order_date,date_id) if you want to use a key to map.

And lastly, I see the order_id as your key to the fact table. So I would go

OrdersToFactOrders (order_id,dim_date_id,dim_customer_id,dim_product_id)

In my case I renamed the fields for the mart with dim_field_id because I didn't want name collision within my tables or confusion as to which Id they pointed to. Your ETL would have to know that CustomersToDim_Customers.dim_customer_id really maps to Dim_Customers.customer_id and that CustomersToDim_Customers.customer_id really maps to Customers.customer_id.

I would also be half-inclined to include the order_number in the OrdersToFactOrders mapping table, but that is because I like to have tracking data for audit purposes. Makes my life easier. But, based on what you told me, order_number and order_id are one-to-one so the inclusion of order_number would be redundant then and only necessary if you have a perfectionist paranoia to make sure your data is correct on both sides (I really like to make sure that A on side A and B on side B are really correct after the ETL is done.).