How to calculate/store Top 10 in a tabular model

ssastabular-model

We have recently created an SSAS tabular model so our users can access it via PowerView. We have a measure on one of our fact tables to get the TotalActiveItems using a formula:

TotalActive:=COUNTAX(FILTER('Stats', ISBLANK([DeactDate]) = TRUE), 1)

This works great as needed but now we have a request to get the Top 10 parents for each month in the TotalActive.

For reference, here is part of our model:

create table factStats
(
    StatsID INT IDENTITY NOT NULL PRIMARY KEY,
    DevID INT NOT NULL,
    DeactDate DATETIME NULL,
    BillDateTimeID BIGINT NOT NULL,
    CustID INT NOT NULL,
    ParentID INT NOT NULL
);

create table dimCust
(
    CustID INT NOT NULL PRIMARY KEY,
    CustName varchar(150) NOT NULL
);

create table dimParent
(
    ParentID INT NOT NULL PRIMARY KEY,
    ParentName varchar(100) NOT NULL
);

create table dimDateTime
(
    DateTimeID BIGINT NOT NULL PRIMARY KEY
);

SQL Fiddle with tables and sample data.

The factStats table has FKs to the DevID, CustID, BillDateTimeID, and ParentID. The request that we have is to either calculate or store the Top 10 Parents for each BillDateTimeID based on the TotalActive AND include everything not in the Top 10 in a rolled up category similar to the following:

+----------------+------------+------+
| BillDateTimeID |   Parent   | Rank |
+----------------+------------+------+
|       20140801 | Jim        |    1 |
|       20140801 | Bob        |    2 |
|       20140801 | All Others |    3 |
+----------------+------------+------+

I can easily accomplish this in SQL using windowing functions but trying to reproduce this for SSAS has been difficult. In SQL, we'd get the result using:

;with Total as
(
  select 
    ParentID,
    BillDateTimeID,
    sum(case when DeactDate is null then 1 else 0 end) TotalActive
  from factStats
  group by ParentID, BillDateTimeID
),
PRank as
(
  select 
    ParentID,
    BillDateTimeID,
    TotalActive,
    row_number() over(partition by BillDateTimeID 
                      order by TotalActive desc) pr
  from total
)
select 
  parentid,
  BillDateTimeID,
  TotalActive,
  pr
from prank
where pr <= 2
union all
select 
  0,
  BillDateTimeID,
  sum(TotalActive) TotalActive,
  3
from prank
where pr > 2
group by BillDateTimeID
order by BillDateTimeID desc, pr;

SQL Fiddle Demo.

I've tried several different ways to get the result but each one has had an issue. My attempts are below.

Initially, I was able to somewhat get the data using an MDX query but then had no clue how to incorporate this into our tabular model. The MDX query for reference is:

with 
set [Top10Parent] AS
(
    (TOPCOUNT({ORDER(({[Parent].[Parent Name].[Parent Name]}),
        ([Measures].[Total Count]), BDESC)}, 10))
)
MEMBER [Parent].[Parent Name].[Others] AS
(
    AGGREGATE(EXCEPT([Parent].[Parent Name].[Parent Name], [Top10Parent]))
)
select 
    [Measures].[Total Count] on columns,
    {[Top10Parent]}+ {[Parent].[Parent Name].[Others]} on Rows
from [OurModel]
where {[Date and Time].[Month and Year].[Month and Year].[Jul 2014]};

Of course, this also only gave me the result for a single month, not every month.

When I realized that the MDX query wouldn't work, I started out by altering our factStats table to include a new column to flag the items in the Top 10 and in the rolled up value.

alter table factStats
    add Top10ParentID INT NOT NULL
    constraint DF_factStats default (0);

The default constraint references our "Rolled Up" value for the Top 10.

Attempt #1: I created new Top 10 table to store the ParentID, name and the Rank:

create table dimTop10Parent
(
    Top10ParentID INT NOT NULL PRIMARY KEY,
    ParentName varchar(100) NOT NULL,
    Parent_Rank INT NOT NULL
);

This table will then be populated each time we refresh our model with the new Top 10 Parents based on the Total Active items that they have. The Parent_Rank column is then hidden in our tabular model and used exclusively for sorting. This works great, except we don't have the ability to historically get the Top 10 since it is not based on a month to month basis.

Attempt #2: Create a new table to store the Top 10 but the PRIMARY KEY will include both the Top10ParentID and a BillingDateTimeID.

create table dimTop10Parent
(
    Top10ParentID INT NOT NULL,
    ParentName varchar(100) NOT NULL,
    Parent_Rank INT NOT NULL,
    BillDateTimeID BIGINT NOT NULL
);

The problem with this is we can't create a relationship between the factStats single FK to the two part PK in the dimTop10Parent in the tabular model.

Attempt #3: Create the new table but use an identity as the PK.

create table dimTop10Parent
(
    Top10ID INT IDENTITY NOT NULL PRIMARY KEY,
    Top10ParentID INT NOT NULL,
    ParentName varchar(100) NOT NULL,
    Parent_Rank INT NOT NULL,
    BillDateTimeID BIGINT NOT NULL
);

The factStats table will store the Top10ID value which will be unique for each row. I thought this would resolve my problem but, it didn't because we can no longer sort by the Parent_Rank in the model, it throws an error:

Cannot sort ParentName by Parent_Rank because at least one value in ParentName has multiple distinct values in Parent_Rank. For example, you can sort [City] by [Region] because there is only one region for each city, but you cannot sort [Region] by [City] because there are multiple cities for each region.

Using the sample data the final result should be similar to (this showing Top 2 with a 3rd rolled up):

| PARENTNAME | BILLDATETIMEID | TOTALACTIVE | PR |
|------------|----------------|-------------|----|
|     FDN    |   201408010000 |          11 |  1 |
|     FDO    |   201408010000 |           3 |  2 |
| All Others |   201408010000 |           5 |  3 |
|     FDN    |   201407010000 |          12 |  1 |
|     EVOD   |   201407010000 |           2 |  2 |
| All Others |   201407010000 |           5 |  3 |

At this point, I'm at a loss on how to get this final result. I can alter the tables as needed to get it, I can alter the model by using a formula, measure, etc. I've read about ranking using DAX formulas 1, 2, 3 but I can't seem to wrap my head around them enough to be able to accurately get the result.

How can I calculate/store this Top 10 for any month and still be able to splice the data as needed in our tabular model?

Best Answer

I had a similar scenario and used the following DAX query...

First, to make it simple, I defined a measure to use inside the DAX so I don't have to repeat the formula. Then I used the generate to iterate on the TOPN formula:

define measure TableInTabular[NameOfTheMeasure] = COUNTAX(FILTER('Stats', ISBLANK([DeactDate]) = TRUE), 1)
evaluate
 (
  addcolumns
   (  
    filter
     (  
      generate
        (  
         VALUES(DatesTableName[Month]),  
         TOPN (10, VALUES(TableInTabular[ParentID]),TableInTabular[NameOfTheMeasure],0)
        ),
        TableInTabular[NameOfTheMeasure]>0
      ),
      "ActiveCount (or how you want to call this Column)",
      TableInTabular[NameOfTheMeasure]  
    )  
 )  
order by DatesTableName[Month] asc, 
TableInTabular[NameOfTheMeasure] desc

With the above you should have a top 10 ParentID and the Measure by each month. just replace the "TableInTabular" with your tabular table name where you have the data and the "DatesTableName" with the name of the dates table.

Please let me know if I misunderstood your question and hope it helps...