How to index cube style queries with changing aggrigations

azure-sql-data-warehouseexecution-planperformancequery-performance

I've got a query that emulates the CUBE function.

This is many different aggregations unioned together.

select x,y,z, sum(a) from tbl group by x,y,z
union all
select x,'all',z, sum(a) from tbl group by x,z
union all
select 'all',y,z, sum(a) from tbl group by y,z
union all
select x,'all','all', sum(a) from tbl group by x
etc.

Unfortunately, this is really slow. Everything is distribution aligned. Analysing the estimated query plans shows that a columnstore index always uses hash match to perform this operation. A clustered index will allow one of the queries to use a stream aggregate, although I was expecting a few more ( see plan below)

enter image description here

I was hoping that the below index:

clustered index (x,y,z)

would enable stream aggrigate for the below groupings

x,y,z

x,y

x

Is there something else I'm overlooking?

Best Answer

I am an Oracle developer. Relating to how CI is implemented in Oracle, here are a few pitfalls to avoid.

  1. Ensure that the table tbl is small enough to avoid overflow segments.
  2. The table tbl must be ideally used only for bulk Inserts and Selects. Updates, Deletes and interspersed Inserts will cause page split.
  3. If there are other non clustered indexes on tbl, ensure that those indexes do not end up with huge ClusteringFactor post converting tbl to CI.
  4. Sharding tbl might help provided it does not affect other indexes and SQL performance.