SSAS How to calculate Standard Deviation

mdxolapssas

I'm trying to calculate a variety of statistical calculation across a range of data such as Standard Deviation and Average.

The data comes from many different "Data sources". These are are held in a Dimension.

I tried using STDDEVP function as shown in this example which operates on a measure.

But the results were wrong. I assume it's wrong because it's working on totals rather than raw data. [Sum of Values] is a standard aggregate measure summing the data in the fact table.

Following the link supplied by Greg I am now using this:

CREATE MEMBER CURRENTCUBE.[Measures].[Standard Deviation]
 AS  (([Measures].[Sum of Values]^2 - (([Measures].[Sum of Values]^2)/[Measures].[Count of Values])/[Measures].[Count of Values])^0.5),
FORMAT_STRING = "#,##0.00;-#,##0.00", VISIBLE = 1 ,  ASSOCIATED_MEASURE_GROUP = 'Fact TS Data';

Which gives me this:

Which still isn't correct.

Using SQL If I calculate STDDEV for Air Temp with the following query:

  SELECT stdevp([Value])
  FROM [EnvironmentalDataMart].[DDS_HILLTOP].[factTimeSeries]
  where [DETL_DataSourceSurrogateKey] = 78

I get: 6.13770334742149

How do to get the MDX query to calculate a STDDEV per Datasource in my Dimension?

I've also tried using a Measure with "No aggregation" but it didn't return anything.

Best Answer

The StDev function does operate at a rollup grain per the first parameter you pass in. If you always want to calculate the StDev at the lowest level then you can write some mathematically equivalent measures and avoid the StDev MDX function. This approach is described here. Let me know if that doesn't make sense. The MDX would be:

(([Measures].[Sum of Square] - (([Measures].[Sum of Values]^2)/[Measures].[Count of Values]))/[Measures].[Count of Values])^0.5

See Peter's answer for a full explanation of the steps he took.

Also, if you do stick with the MDX approach you are currently using you will want to fix the first parameter as follows. Your code is factoring the grand total into the calculation. This MDX does not:

STDDEVP( {[Data Source].[Data Source Name].[Data Source Name].MEMBERS},[Measures].[Total])

Related Solutions

Medians, Modes, Percentiles and OLAP

To answer your questions in order:

The cube doesn't store medians, modes (or even averages), but you can write queries that calculate them and embed them as calculated measures in the cube. The ability to embed this sort of computation is one of the main unique selling points of OLAP technology.
If you have a dimension that can identify individual rows (which could be a degenerate or 'fact' dimension derived from an identifier on the fact table) then you can do a query based on individual rows. However, OLAP works in terms of dimensions and aggregates, so you would need to have a dimension capable of identifying individual rows (with an aggregate composed of one value).
Any OLAP tool can do what's described in (2), plus they generally support a mechanism known as 'drill-through' where the cube will return a bordereaux of the transactional data underlying a given slice that you drill through into.

If you want to do calculations that aren't directly available in the cube script, many OLAP tools such as the late, lamented ProClarity will allow you to formulate queries involving custom MDX based calculations. Unless the cube doesn't have the information you need to do the actual calculations, the custom MDX calculations should be able to support any computation you need.

Although OLAP queries are traditionally associated with statistical queries in aggregate, if you have a dimension that allows drill down to detail you need it is certainly possible to formulate queries that will calculate medians, percentiles or histogram queries from which modes can be inferred or computed.

For example, this has an example of a pareto analysis query, which is based on rankings.

Many cube products can operate in a hybrid or relational OLAP mode where they do not persist the data themselves but query it from an underlying database. In addition, pure ROLAP tools such as Business Objects, Report Builder or Discoverer can query from an underlying database and do work row by row. However, they tend to lack the sophistication of dedicated OLAP products, and they don't have much in the way of statistical analysis capability out of the box.

SSAS -> MDX -> Create a price comparision against resellers

If you have the following database schema with Date, Product and Reseller dimensions and a price fact table:

create table DimProduct (
       ProductID    int not null
      ,Product      varchar (50)
)
go

Create table DimDate (
       DateID    datetime not null
      ,MonthID   int not null
)
go

create table DimReseller (
       ResellerID   int not null
      ,Reseller     varchar (50)
)
go

create table FactPrices (
       ProductID    int not null
      ,ResellerID   int not null
      ,DateID       datetime not null
      ,Price        money
)
go

insert DimProduct (ProductID, Product) values (1, 'Widget')
insert DimProduct (ProductID, Product) values (2, 'Dohickey')
insert DimProduct (ProductID, Product) values (3, 'Whoflungdung')
go


insert DimDate (DateID, MonthID) values ('2012-01-01', 201201)
insert DimDate (DateID, MonthID) values ('2012-02-01', 201202)
insert DimDate (DateID, MonthID) values ('2012-03-01', 201203)
go

insert DimReseller (ResellerID, Reseller) values (1, 'Braise and co')
insert DimReseller (ResellerID, Reseller) values (2, 'Cocost')
insert DimReseller (ResellerID, Reseller) values (3, 'Walls Mart')
go

insert FactPrices (ResellerID, ProductID, DateID, Price) values (1, 1, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (1, 1, '2012-02-01', 9)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (1, 2, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (1, 2, '2012-02-01', 10)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (1, 3, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (1, 3, '2012-02-01', 11)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (2, 1, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (2, 1, '2012-02-01', 10.5)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (2, 2, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (2, 2, '2012-02-01', 9.90)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (2, 3, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (2, 3, '2012-02-01', 11)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (3, 1, '2012-01-01', 12)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (3, 1, '2012-02-01', 10.5)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (3, 2, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (3, 2, '2012-02-01', 11)

insert FactPrices (ResellerID, ProductID, DateID, Price) values (3, 3, '2012-01-01', 10)
insert FactPrices (ResellerID, ProductID, DateID, Price) values (3, 3, '2012-02-01', 10.5)

go

Create a cube with Date, Product and Reseller dimensions and a SalesTrends fact table with a single metric 'Price'. The MDX query below will produce counts of products for which the price is increasing, static or decreasing in a given period.

with 
member [Measures].[Delta]
    as [Measures].[Price] -
       (parallelperiod ([Date].[Calendar].[Month]
                       ,1
                       ,[Date].[Calendar].currentmember)
       ,[Measures].[Price])

member [Measures].[Rising]
    as count (filter ([Reseller].[Reseller].Children
                      ,[Measures].[Delta] > 0))
      ,solve_order = 20

member [Measures].[Static]
    as count (filter ([Reseller].[Reseller].Children
                      ,[Measures].[Delta] = 0))
      ,solve_order = 20

member [Measures].[Falling]
    as count (filter ([Reseller].[Reseller].Children
                      ,[Measures].[Delta] < 0))
      ,solve_order = 20

select {[Measures].[Rising]
       ,[Measures].[Static]
       ,[Measures].[Falling]}
    on columns 
      ,[Product].[Product].Children
    on rows
  from [SalesTrends]
 where ([Date].[Calendar].[Month].[201202])

The query calculates the delta from the previous period, and calculates three metrics for rising, static or falling based on counts of resellers where the delta is positive, zero or negative. Note that the prices will aggregate in strange ways if you to calculate this at anything but the lowest grain of the underlying fact table.

Best Answer

Related Solutions

Medians, Modes, Percentiles and OLAP

SSAS -> MDX -> Create a price comparision against resellers

Related Question