In the ETL process, you can replace the codes with names while loading a target table. I'll focus on Informatica PowerCenter but I'm sure other ETL tools offer a similar feature.
There is a Lookup
transformation that is used to look up values (DNAME
) from a a relational table (it may also be a view or a flat file), based on a defined lookup match criteria (source.DEPTNO = lookup.DEPTNO
). These values may then be appended to the source rows and stored in a target table (that is used for reporting).
When a session is executed, a SELECT
statement is generated for each lookup in the mapping. These statements are run against the data source once and the results are stored in the lookup cache. Later, when a value needs to be looked up, the transformation uses the cache.
(This answers the other question about why the histograms are different.)
Histograms are created by default based on column skew and whether the column was used in a relevant predicate. Copying the DDL and the data is not enough, the
workload information is also important.
According to the Performance Tuning Guide:
When you drop a table, workload information used by the auto-histogram
gathering feature and saved statistics history used by the
RESTORE_*_STATS procedures is lost. Without this data, these features
do not function properly.
For example, here is a table with skewed data but no histogram:
drop table test1;
create table test1(a date);
insert into test1 select date '2000-01-01'+level from dual connect by level <= 10;
insert into test1 select date '2000-01-01' from dual connect by level <= 1000;
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
select histogram from user_tab_columns where table_name = 'TEST1';
HISTOGRAM
---------
NONE
Running the same thing, but with a query before the statistics are gathered, will generate a histogram.
drop table test1;
create table test1(a date);
insert into test1 select date '2000-01-01'+level from dual connect by level <= 10;
insert into test1 select date '2000-01-01' from dual connect by level <= 1000;
select count(*) from test1 where a = sysdate; --Only new line
begin
dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
select histogram from user_tab_columns where table_name = 'TEST1';
HISTOGRAM
---------
FREQUENCY
Best Answer
I'm not sure how you would do this with a nosql database like mongodb but in a typical data warehouse I would do this with the dimension table hierarchy.
Then I could join my fact table to the dimension and pull out the hierarchy level that I would like to report on.
So in your example I could pull out the Product_Lvl1 to get iPad but if I wanted to compare sales/inventory of 64gb compared to the 16gb iPad I could do that using the Product_Lvl2.