Sql-server – Columnstore index on multi-tenant database

columnstoreindexindex-tuningsql server

I have a sql azure database that is used only for analytics.

It is constantly being updated with data pushed from the OLTP system.

The OLTP inserts/updates/deletes are pushed via webhooks to a service that propagates the changes into the analytics db in real time.

As a result, we have a db that must sustain both a OLTP and analytics work loads, althought the analytics queries performance is more important.

I haven't used columnstore indexes before, but after doing minimal research it seems like this type of payload can be good fit for nonclustered columnstore indexes. The clustered index would stay rowstore, for fast insert/update/delete.

But, I have an extra requirement that I haven't seem mentioned: multi-tenancy.
When users use our analytics app, they are always associated with a single tenant id, and the analytics queries will always have a WHERE clause on the tenant id. The tenant id column is highly selective. The biggest tenant makes up for less than 5% of all rows and most tenants make up for less than 1% of rows.

Is it still a good idea to use NCCI indexes for multitenants database? If so, should I set up my index in a particular way?

Best Answer

Is it still a good idea to use NCCI indexes for multitenants database?

If the alternative is to have a data structure that efficiently filters by tenant, then no.

If TenantId is the leading column in your clustered index, then single-tenant queries will be pretty fast. The NCCI row groups will not be segregated by tenant, and so all the row groups will need to be scanned to find rows for a single tenant.

As rows are inserted, they are inserted into the "middle" of the clustered index, typically at the "end" of the rows for the tenant. But rows are always inserted into the delta store for an CCI/NCCI. And whenever the delta store has 1 million rows in it, it's rebuilt into a columnar row group. So each row group will end up with the last million rows inserted across all tenants.

You can fix this with partitioning by TenantID, which could give each tenant (or group of tenants) a separate physical table (or NCCI). Each partition will have its own delta store, which will fill up with single-tenant rows. If you do partition, each partition might end up too small for a Columnstore to be useful. You'll want a few million rows per partition at least.

In general with multi-tenant data, you want to avoid interleaving tenant data in a way that requires you to read all tenants data to retrieve a single tenant's data.