Sql-server – MDX Max, Average and StDev functions not giving correct results

business-intelligencemdxsql serverssasstatistics

I have a scenario where I have Date dimension and Time dimension along with some other dimensions. The time dimension contains only Hours since there was no need to save data at minute or second level.

A situation requires me to calculate Average of Max of value of certain hours of day. For instance, I need the Max of value between 01:00 to 04:00 hours and average of these maximum values in the given time period (01 January 2015 – 03 January 2015 in this example).

I am filtering Date and Time dimensions (along with other dimensions) as Subselect From clauses instead of using Where.

If the Time dimension members remain the same for each selected value of Date dimension, then Max, Avg and StDev give correct result, but if I have to exclude some hours from one of the Date dimension members, Max results in error, and Avg and StDev give incorrect values.

Please refer to the SQL code below to create the example I will be using in this question:

create database TestDateTimeDimensions
GO

USE [TestDateTimeDimensions]
GO

create table DimDate (
DateId int not null PRIMARY KEY,
Date [Date] not null
)


insert into DimDate
select 20150101, CONVERT(DATE, '2015-01-01')
UNION ALL
select 20150102, CONVERT(DATE, '2015-01-02')
UNION ALL
select 20150103, CONVERT(DATE, '2015-01-03')
UNION ALL
select 20150104, CONVERT(DATE, '2015-01-04')

create table DimTime(
TimeId int not null PRIMARY KEY,
[Time] Time not null
)

insert into DimTime
select 0, CONVERT(TIME, '00:00:00')
UNION ALL
select 1, CONVERT(TIME, '01:00:00')
UNION ALL
select 2, CONVERT(TIME, '02:00:00')
UNION ALL
select 3, CONVERT(TIME, '03:00:00')
UNION ALL
select 4, CONVERT(TIME, '04:00:00')
UNION ALL
select 5, CONVERT(TIME, '05:00:00')
UNION ALL
select 6, CONVERT(TIME, '06:00:00')
UNION ALL
select 7, CONVERT(TIME, '07:00:00')
UNION ALL
select 8, CONVERT(TIME, '08:00:00')
UNION ALL
select 9, CONVERT(TIME, '09:00:00')
UNION ALL
select 10, CONVERT(TIME, '10:00:00')
UNION ALL
select 11, CONVERT(TIME, '11:00:00')
UNION ALL
select 12, CONVERT(TIME, '12:00:00')
UNION ALL
select 13, CONVERT(TIME, '13:00:00')
UNION ALL
select 14, CONVERT(TIME, '14:00:00')
UNION ALL
select 15, CONVERT(TIME, '15:00:00')
UNION ALL
select 16, CONVERT(TIME, '16:00:00')
UNION ALL
select 17, CONVERT(TIME, '17:00:00')
UNION ALL
select 18, CONVERT(TIME, '18:00:00')
UNION ALL
select 19, CONVERT(TIME, '19:00:00')
UNION ALL
select 20, CONVERT(TIME, '20:00:00')
UNION ALL
select 21, CONVERT(TIME, '21:00:00')
UNION ALL
select 22, CONVERT(TIME, '22:00:00')
UNION ALL
select 23, CONVERT(TIME, '23:00:00')

CREATE TABLE Fact (
Id int identity not null PRIMARY KEY,
DateId int not null REFERENCES DimDate(DateId),
TimeId int not null REFERENCES DimTime(TimeId),
value int not null
)

insert into Fact

select DateId, TimeId, ROUND(RAND(CONVERT(varbinary, NEWID())) * 100, 2)
FROM DimDate, DimTime

The MDX query that works correctly is:

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))

SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id]
SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id]

select
{
    [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax
}
on 0,
{
    DateSet * TimeSet
} on 1

FROM 
(
    SELECT ({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150103]}, {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]})
    on 0
    FROM [Test Date Time Dimensions]
)

It displays the result correctly as:

Correct Result

Note that the items included in Axis 1 are not used and the actual query returns only one row. They are just for this example.

The query that doesn't work as expected is:

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))

SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id]
SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id]

select
{
    [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax
}
on 0

,
{
    DateSet * TimeSet
} on 1

FROM 
(
    SELECT {({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150102]} * {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]})
    , ([Dim Date].[Date Id].&[20150103] * [Dim Time].[Time Id].&[3] : [Dim Time].[Time Id].&[4])}
    on 0
    FROM [Test Date Time Dimensions]
)

Here, I have excluded Hours 01:00 and '02:00from Date3 January 2015. The maximum value (70) is at hour01:00of3 January 2015but is still being used to calculate Avg andStDev, andMax` column is giving error, as shown in the image:

Error

Here Average and Standard Deviation are both incorrect.

Please help me finding a way to correctly calculate the results in this second case, and explain why Max is giving error and Avg and StDev functions still calculating including the Maximum value that is not included in the filtered Subselect clause.

A less than ideal workaround that I have come up with is to add a new DateTime Dimension that has both Date and Time, and use this DateTime dimension in the Subselect clause, and utilize the existing Time dimension to calculate Max and Date dimension to get Average and Standard Deviation, as in code below:

Create and fill the DateTime dimension:

create table DimDateTime
(
    DateTimeId int not null PRIMARY KEY,
    [Date] Date not null,
    [Time] Time not null 
)

insert into DimDateTime

select ((DateId * 100) + TimeId) DateTimeId, DimDate.Date, DimTime.Time
from DimDate, DimTime

alter table Fact add DateTimeId int References DimDateTime(DateTimeId)

update Fact set DateTimeId = ((DateId * 100) + TimeId)

The MDX query returning correct result:

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))

SET DateSet as NONEMPTY ([Dim Date].[Date Id].[Date Id])
SET TimeSet as NONEMPTY ([Dim Time].[Time Id].[Time Id])

select
{
    [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax
}
on 0,
{
    NONEMPTY(DateSet * TimeSet)
} on 1

FROM 
(
    SELECT ({[Dim Date Time].[Date Time Id].&[2015010101] : [Dim Date Time].[Date Time Id].&[2015010104],
            [Dim Date Time].[Date Time Id].&[2015010201] : [Dim Date Time].[Date Time Id].&[2015010204],
            [Dim Date Time].[Date Time Id].&[2015010303] : [Dim Date Time].[Date Time Id].&[2015010304]})
    on 0
    FROM [Test Date Time Dimensions]
)

But I would prefer to avoid creating a new dimension, in addition to the existing two dimensions containing the same data. Secondly I want to avoid getting the data into the application and calculating the result in API methods. So the preference is to make Max, Avg and StDev functions to work correctly.

Using SQL Server 2012.

Best Answer

I tried doing it in MDX but to no avail, therefore I had to use the SSAS User Defined Function to calculate the Avg of Max. The set to calculate Avg (DateSet) across, set to calculate Max across (TimeSet) and a third set to containing the records to be excluded in Max operation, are passed as parameters to the UDF.

The basic code is like (presenting the coded Avg of Max functionality, excluding error checking):

public static class AvgOfMaxMethods
    {
        public static double AvgOfMax(Set maxAcrossSet, Set avgAcrossSet, Set setToExclude, Expression measureExpression)
        {
            var setToExcludeQueryable = setToExclude.Tuples.OfType<Microsoft.AnalysisServices.AdomdServer.Tuple>();
            IList<double> maxMembers = new List<double>();
            foreach(var avgAcrossTuple in avgAcrossSet.Tuples)
            {
                var max = double.MinValue;
                foreach(var maxAcrossTuple in maxAcrossSet.Tuples)
                {
                    if (!setToExcludeQueryable.Any(tuple => tuple.Members[0].UniqueName.Equals(avgAcrossTuple.Members[0].UniqueName) && tuple.Members[1].UniqueName.Equals(maxAcrossTuple.Members[0].UniqueName)))
                    {
                        TupleBuilder tb = new TupleBuilder(avgAcrossTuple.Members[0]);
                        tb.Add(maxAcrossTuple.Members[0]);
                        var calculatedVal = measureExpression.Calculate(tb.ToTuple()).ToDouble();
                        max = calculatedVal > max ? calculatedVal : max;
                    }
                }
                if (!max.Equals(double.MinValue))
                {
                    maxMembers.Add(max);
                }
            }
            return maxMembers.Average();
        }
    }

An with this the query becomes (keeping the original Max and StDev for comparison):

WITH
MEMBER MaxMember as MAX(TimeSet, [Measures].[Value])
MEMBER AvgOfMax as Avg(DateSet, MAX(TimeSet, [Measures].[Value]))
MEMBER StDevOfMax as StDev(DateSet, MAX(TimeSet, [Measures].[Value]))
Member MaxFromUDF as UDFPoc.UDFPoc.AvgOfMaxMethods.AvgOfMax(TimeSet, DateSet, SetToExclude, [Measures].[Value])

SET DateSet as EXISTING [Dim Date].[Date Id].[Date Id]
SET TimeSet as EXISTING [Dim Time].[Time Id].[Time Id]
SET SetToExclude as {[Dim Date].[Date Id].&[20150103] * [Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[2]}

select
{
    [Measures].[Value], MaxMember, AvgOfMax, StDevOfMax, MaxFromUDF
}
on 0,
{
    DateSet * TimeSet
} on 1

FROM 
(
    SELECT ({[Dim Date].[Date Id].&[20150101] : [Dim Date].[Date Id].&[20150103]}, {[Dim Time].[Time Id].&[1] : [Dim Time].[Time Id].&[4]})
    on 0
    FROM [Test Date Time Dimensions]
)

And the results look like (here tuples of the excluded set is highlighted in red):

enter image description here

As can be seen, the UDF excludes the members of each tuple in the SettoExclude set when calculating Max and Average. Now correct average considering the excluded set is being calculated, as can be seen in the image. Similarly StDev can be calculated in this manner.

See this as a guide to create SSAS User Defined Functions.