Standard way to replace codes with values from a lookup table for reporting or analytics

data-warehouseetljoin;performance

Suppose I have a "main" table that was probably heavily normalized and consists largely of columns that merely contain codes that are lookups into other tables (they are probably foreign keys but feel free not to assume so in your answer if it makes sense). For instance, maybe this main table is a list of people and one of the columns is for favorite_color. Of course then our lookup table would looks something like this:

color_cd    color_name
0           Unknown
1           Red
2           Blue
3           Orange
4           Other

Now if I'm creating reports from this main table or perhaps feeding this data into some software for domain experts to use, I'd rather replace the favorite_color code with the color_name from the lookup table. One solution of course is to do this with a simple JOIN in my SELECT:

SELECT main.col1, main.col2, colors.color_name as `Favorite Color` 
  FROM main LEFT OUTER JOIN colors ON (main.favorite_color = colors.color_cd);

This gets the job done. Now suppose that the main table has dozens or maybe hundreds of such columns and millions of rows. Some of the lookup tables are small, like the above example, but others might have hundreds or thousands of values.

Is there any sort of standard way of doing this transformation, or is the writing a long with dozens or hundreds of joins the way to go? Is this type of thing a typical component of any ETL product? Is it preferable to write a script that handles this without joins and outside of SQL? Please forgive what are likely some abuses or misuses of terminology; I'm not even sure if the above process has a name.

Best Answer

In the ETL process, you can replace the codes with names while loading a target table. I'll focus on Informatica PowerCenter but I'm sure other ETL tools offer a similar feature.

There is a Lookup transformation that is used to look up values (DNAME) from a a relational table (it may also be a view or a flat file), based on a defined lookup match criteria (source.DEPTNO = lookup.DEPTNO). These values may then be appended to the source rows and stored in a target table (that is used for reporting).

Lookup transformation

When a session is executed, a SELECT statement is generated for each lookup in the mapping. These statements are run against the data source once and the results are stored in the lookup cache. Later, when a value needs to be looked up, the transformation uses the cache.