What are some ways to implement a many-to-many relationship in a data warehouse

data-warehousedatabase-design

The dominant topologies of Data Warehouse modelling (Star, Snowflake) are designed with one-to-many relationships in mind. Query readability, performance, and structure degrades severely when faced with a many-to-many relationship in these modelling schemes.

What are some ways to implement a many-to-many relationship between dimensions or between the fact table and a dimension in a data warehouse and what compromises do they inflict with regards to necessary granularity and query performance?

Best Answer

In my experience, a recursive hierarchy is the most practical way of tackling this. It offers the following advantages:

  1. Unlimited depth.
  2. Compactness.
  3. Flexibility.
  4. Speed.

By contrast, it takes an extra table for each level of "-to-many" joins. This is hard coded and difficult to maintain against schema updates.

By using filtered indexes, a large table of hierarchical joins can perform at superior speed to dedicated tables. The reason is each join is only "parent-child" compared to "to join table to data table". The latter has more indexes to process and store.

I've been trying to solve this problem for many years. Recently, this is what I came up with.