Mongodb – What are the best Database Analytics Companies for analyzing several 1000s or Mio. of products,sum them together and analyze them for statistics

data-warehousemongodboptimization

We will get several thousands products in our database. The problem is that one product can have several different names e.g. "iPad 16GByte UMTS schwarz" or "iPad 16GB 3G black" and so on. But they are the same product. We would like to combine them together for an exact analysis (like how many products in which city in a specific period of time). What would be the best approach? What are the best experts for this?

Best Answer

I'm not sure how you would do this with a nosql database like mongodb but in a typical data warehouse I would do this with the dimension table hierarchy.

Product_ID   Full_Product_Nm            Product_Lvl1       Product_Lvl2
----------   ---------------            --------------     ------------
1            iPad 16GByte UMTS schwarz  iPad               iPad 16gb
2            iPad 16GB 3G black         iPad               iPad 16gb
3            iPad 64GByte UMTS schwarz  iPad               iPad 64gb
4            iPad 64GB 3G black         iPad               iPad 64gb

Then I could join my fact table to the dimension and pull out the hierarchy level that I would like to report on.

So in your example I could pull out the Product_Lvl1 to get iPad but if I wanted to compare sales/inventory of 64gb compared to the 16gb iPad I could do that using the Product_Lvl2.

Related Solutions

Standard way to replace codes with values from a lookup table for reporting or analytics

In the ETL process, you can replace the codes with names while loading a target table. I'll focus on Informatica PowerCenter but I'm sure other ETL tools offer a similar feature.

There is a Lookup transformation that is used to look up values (DNAME) from a a relational table (it may also be a view or a flat file), based on a defined lookup match criteria (source.DEPTNO = lookup.DEPTNO). These values may then be appended to the source rows and stored in a target table (that is used for reporting).

Lookup transformation

When a session is executed, a SELECT statement is generated for each lookup in the mapping. These statements are run against the data source once and the results are stored in the lookup cache. Later, when a value needs to be looked up, the transformation uses the cache.

Oracle Unique Index – Why Oracle is Not Using a Unique Index for a Long Key

(This answers the other question about why the histograms are different.)

Histograms are created by default based on column skew and whether the column was used in a relevant predicate. Copying the DDL and the data is not enough, the workload information is also important.

According to the Performance Tuning Guide:

When you drop a table, workload information used by the auto-histogram gathering feature and saved statistics history used by the RESTORE_*_STATS procedures is lost. Without this data, these features do not function properly.

For example, here is a table with skewed data but no histogram:

drop table test1;
create table test1(a date);
insert into test1 select date '2000-01-01'+level from dual connect by level <= 10;
insert into test1 select date '2000-01-01' from dual connect by level <= 1000;
begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
select histogram from user_tab_columns where table_name = 'TEST1';

HISTOGRAM
---------
NONE

Running the same thing, but with a query before the statistics are gathered, will generate a histogram.

drop table test1;
create table test1(a date);
insert into test1 select date '2000-01-01'+level from dual connect by level <= 10;
insert into test1 select date '2000-01-01' from dual connect by level <= 1000;
select count(*) from test1 where a = sysdate; --Only new line
begin
    dbms_stats.gather_table_stats(user, 'TEST1');
end;
/
select histogram from user_tab_columns where table_name = 'TEST1';

HISTOGRAM
---------
FREQUENCY

Best Answer

Related Solutions

Standard way to replace codes with values from a lookup table for reporting or analytics

Oracle Unique Index – Why Oracle is Not Using a Unique Index for a Long Key

Related Question