Sql-server – Combining retail sales and payment OLAP cubes

business-intelligencedata-warehouseolapsql-server-2005ssas

I'm using SQL Server 2005 Standard Edition Analysis Services (SSAS). I'm trying to build cubes from a retail OLTP database that can answer these kinds of questions:

List down all products that were paid with certain type of payment (which products are bought with voucher type X, etc)
Create a breakdown of frequency of payment type used to pay for product X (how many times cash is used to pay for product X, how many times debit, credit, voucher, etc)

There are 3 main transaction tables: SALESHEADERS, SALESDETAILS, and SALESPAYMENTS.

SALESHEADERS contains sales date, cashier, POS code, etc.
SALESDETAILS contains header code, product code, quantity, unit price, gross, discount, etc.
SALESPAYMENTS contains header code, payment type code, payment amount, etc.

One transaction can contain many products. One transaction can be paid with multiple types of payment. There is no connections between detail and payment except through header (sales ID).

Currently I have created 2 cubes, one sales cube and one payment cube.

Sales cube is created by joining the header and detail. Payment cube is created by joining the header and payment.

Both cubes are performing well, but they can't answer the questions at the top of my post.

Best Answer

You need to start off with a cross join of detail and payment, because you say "products in any transactions that includes a certain type of payment are considered to be paid with that payment type", so you are effectively asking for parts of a transaction to be counted multiple times if there are multiple payment types used.

Related Solutions

Medians, Modes, Percentiles and OLAP

To answer your questions in order:

The cube doesn't store medians, modes (or even averages), but you can write queries that calculate them and embed them as calculated measures in the cube. The ability to embed this sort of computation is one of the main unique selling points of OLAP technology.
If you have a dimension that can identify individual rows (which could be a degenerate or 'fact' dimension derived from an identifier on the fact table) then you can do a query based on individual rows. However, OLAP works in terms of dimensions and aggregates, so you would need to have a dimension capable of identifying individual rows (with an aggregate composed of one value).
Any OLAP tool can do what's described in (2), plus they generally support a mechanism known as 'drill-through' where the cube will return a bordereaux of the transactional data underlying a given slice that you drill through into.

If you want to do calculations that aren't directly available in the cube script, many OLAP tools such as the late, lamented ProClarity will allow you to formulate queries involving custom MDX based calculations. Unless the cube doesn't have the information you need to do the actual calculations, the custom MDX calculations should be able to support any computation you need.

Although OLAP queries are traditionally associated with statistical queries in aggregate, if you have a dimension that allows drill down to detail you need it is certainly possible to formulate queries that will calculate medians, percentiles or histogram queries from which modes can be inferred or computed.

For example, this has an example of a pareto analysis query, which is based on rankings.

Many cube products can operate in a hybrid or relational OLAP mode where they do not persist the data themselves but query it from an underlying database. In addition, pure ROLAP tools such as Business Objects, Report Builder or Discoverer can query from an underlying database and do work row by row. However, they tend to lack the sophistication of dedicated OLAP products, and they don't have much in the way of statistical analysis capability out of the box.

Sql-server – What are Measures and Dimensions in Cubes

Think about a process or event that you want to analyze.

Let's say you were building Lougle Analytics, and want to analyze visits (as opposed to single-page requests) to your site. Visiting a website is a process.

A fact table represents a process or event you want to analyze, in this case it is a list of site visits. You can have as many fact tables as you want, one per process or event.

What things might be useful to you in analyzing site visits?

Information about the web browser (brand, screen resolution, ...)
Information about the user (country, state, city, ISP, based on their IP address)
Information about when the visit started (year, quarter, month, week, day)
The duration of the visit
The referring page, landing page, and exit page (title, url path)
The number of pages visited during the visit

Technically, you could put this all into one table - as you would in Excel, but that'd get real big real fast, so we'll do ONE level of normalization. Dimensional design is often just "put everything you need for an analysis in one big table", and then normalize out one level.

So your fact table will look like this:

web_browser_key          bigint
ip_address_key           bigint
start_date_key           int
referring_page_key       bigint
landing_page_key         bigint
exit_page_key            bigint
duration_seconds         int
number_of_pages_visited  int

It has a bunch of key values that refer to values in other tables†, and two non-keys. The non-keys are numeric values and are called Measures. Can you take the average of the web browser brand name? No, so it's a dimension. Can you take the average of the visit duration? Yes, so it's a measure.

The other tables are called Dimension tables, and the ip_address dimension table might look like this:

ip_address_key  bigserial primary key, /* use meaningless surrogate keys */
ip_address      inet unique,
country         text,
division        text,
locality        text,
latitude        numeric(8,6),
longitude       numeric(9,6)

Notice that it is not normalized: country could be derived from city (locality). However in a data warehouse, we care about facilitating analysis first. By facilitating analysis, I mean avoiding deep joins.

Notice that some of the data in the dimension table is hierarchical: country > division > locality. Your OLAP server (hopefully) understands hierarchies to support drill-down operations.

A logical Cube is just the collection of Dimensions and Measures that you are working with. Unlike a Rubik's cube, it can have more than 3 dimensions. Just think of a dimension as a column in your data set.

OLAP is a set of operations you can do against a data set, such a pivoting, slicing, dicing, drilling. Think of Excel PivotTables. An OLAP Server facilitates OLAP operations.

† usually without foreign keys

Best Answer

Related Solutions

Medians, Modes, Percentiles and OLAP

Sql-server – What are Measures and Dimensions in Cubes

Related Question