Sql-server – SSAS cube process time increased after making fact and dimension small

olapprocesssql serversql-server-2012ssas

I have a cube of 55 GB size which takes around 2 Hour to process full. As I have data for last 4 years but our business wants only 2 years of data.
For that, I have changed dimension and Fact views to have only last two years of data. Now, my cube size is reduced to 32 GB but processing time increased by 30 min (i.e. 2 hours 30 min). I was expecting it to be less as I have restricted good amount of data going into the cube.
Why the processing time has increased when it should actually decrease?
How can I reduce processing time now?

P.S.: I have already tried cube partition and due to large dimension size it is also increasing processing time.

I am using views where I have restricted the data via WHERE clause.

My view is basically like this:

SELECT V.Col1, V.Col2.... V.Col13 
FROM DimeTable V 
WHERE <my filter clause>

It is selecting almost all columns from the dimension table so I can't create non clustered index on all these columns as it may slow down insert operation and won't help me much

Best Answer

Processing a cube largely consists of 3 steps,

  1. Getting the data
  2. building indexes
  3. calculating aggregations

Step 2 and 3 are the least expensive (during processing) in my opinion so let's start with that.

Building indexes does little more than calculating bitmap indexes for your attribute relationships. So depending on how many of those you have designed this could take longer or not, but usually shouldn't take half an hour. This does little more than saying "hey if this item group is filtered on, I already know what Items are in it, so I can add those up instead of doing a NONEMTPTYCROSSJOIN"

Calculating aggregations is the process in which the subtotals are calculated for every level where you defined them. If you haven't used any usage based optimization or attribute hierarchies and haven't defined any aggregations yourself those are only calculated on the leaf level of your dimensions hierarchies (aka the lowest level of every attribute). This is basically "hey if I need the sales for this item group I have precalculated it and don't have to add up these items"

A large part of processing time is used while fetching data. If you would trace your queries while processing you would notice that for every single dimension attribute a SELECT DISTINCT attributename FROM dimension_table is executed. If that dimension table is a view, and the query from the view is slower after adding a where, that dimension processing might become slower by time_the_view_is_slower * number_of_attributes.
How many columns are in the select list of your view is largely irrelevant, the number of attributes in your dimension is.

If you have a distinct count measure anywhere your extra where could have inserted a sort operation too because of a new execution plan.

So in your case my guess would be that your dimension views have gotten slower due to the fact that the valid date isn't indexed, or the queries aren't selective enough and the added processing time of the source queries exaggerates the reduced time of building indexes and aggregations.

You most likely have improved your cube browsing and MDX performance in the process by reducing the number of cells calculated with less data in the cube, but the processing time may just have grown by creating slower source queries, and that gets multiplied by the number of attributes.

Then again, I'm not sure what the problem is with larger processing times if you leave everything default. The performance of the SSAS solution should be more important than the processing time. If you run in to issues due to the processing time that might be another problem that needs fixing in another way.

So I guess, in the end, If you're worried about processing time, either load less data in your source database (which, depending on your setup, could reduce ETL loading time) or tune all those distinct queries by indexing your view accordingly.

If you want to know what exactly is slowing down your processing you could trace all the queries before and after the change and compare the execution times to see what queries are killing you.