The answer by ThomasStringer is very good and the usage of partitioning here would likely help overall performance and decrease your maintenance cost - however, it won't do anything for your page split scenario.
Can you verify that your FK's are trusted? This won't decrease or impact your page splits but it will help your query execution time.
select 'ALTER TABLE ' + s.name + '.' + OBJECT_NAME(o.object_id) + ' WITH CHECK CHECK CONSTRAINT ' + i.name + '
GO'
FROM sys.foreign_keys i INNER JOIN sys.objects o ON i.parent_object_id = o.object_id INNER JOIN sys.schemas s ON o.schema_id = s.schema_id WHERE i.is_not_trusted = 1 AND i.is_not_for_replication = 0
-- Foreign Key Check... Will build the statement below to CHECK (validate) the FK for those FK's that are untrusted.
UNION
select 'ALTER TABLE ' + s.name + '.' + OBJECT_NAME(o.object_id) + ' WITH CHECK CHECK CONSTRAINT ' + i.name + '
GO'
from sys.check_constraints i join sys.objects o on i.parent_object_id = o.object_id join sys.schemas s on o.schema_id = s.schema_id where i.is_not_trusted = 1 and i.is_not_for_replication = 0
-- Check Constraint Check... Will build the statement below to CHECK (validate) the check constraints for those check constraints that are untrusted.
-- !!!! -- The output (in TEXT) will look similar to what you see below. once you have the output, put it into the query window and execute it.
Another idea you already hit on, using fill factor - if you use a low fill factor and combine that with either row or page level compression, your data density, per page will increase. Again, this won't help your page split problem, BUT it can decrease the quantity of page splits (due to having a higher data density per page).
Lastly, if you can, look at changing your index and lead with the log_time column instead of the foreign key column. This change, depending on how the log_time data comes in (hopefully it's more "in order" than the FK column). This could decrease your page splits significantly. If you pair this with both compression (ROW) and a good partition scheme, you might see some significant improvements.
Answer
Since you refer to the website use-the-index-luke.com
, consider the chapter:
Use The Index, Luke › The Where Clause › Searching For Ranges › Greater, Less and BETWEEN
It has an example that matches your situation perfectly (two-column index, one is tested for equality, the other for range), explains (with more of those nice index graphics) why @ypercube's advice is accurate and sums it up:
Rule of thumb: index for equality first — then for ranges.
Also good for just one column?
What to do for queries on just one column seems to be clear. More details and benchmarks concerning that under these related question:
Less selective column first?
Apart from that, what if you have only equality conditions for both columns?
It doesn't matter. Put the column first that is more likely to receive conditions of its own, which actually matters.
Consider this demo, or reproduce it yourself. I create a simple table of two columns with 100k rows. One with very few, the other one with lots of distinct values:
CREATE TEMP TABLE t AS
SELECT (random() * 10000)::int AS lots
, (random() * 4)::int AS few
FROM generate_series (1, 100000);
DELETE FROM t WHERE random() > 0.9; -- create some dead tuples, more "real-life"
ANALYZE t;
SELECT count(distinct lots) -- 9999
, count(distinct few) -- 5
FROM t;
Query:
SELECT *
FROM t
WHERE lots = 2345
AND few = 2;
EXPLAIN ANALYZE
output (Best of 10 to exclude caching effects):
Seq Scan on t (cost=0.00..5840.84 rows=2 width=8)
(actual time=5.646..15.535 rows=2 loops=1)
Filter: ((lots = 2345) AND (few = 2))
Buffers: local hit=443
Total runtime: 15.557 ms
Add index, retest:
CREATE INDEX t_lf_idx ON t(lots, few);
Index Scan using t_lf_idx on t (cost=0.00..3.76 rows=2 width=8)
(actual time=0.008..0.011 rows=2 loops=1)
Index Cond: ((lots = 2345) AND (few = 2))
Buffers: local hit=4
Total runtime: 0.027 ms
Add other index, retest:
DROP INDEX t_lf_idx;
CREATE INDEX t_fl_idx ON t(few, lots);
Index Scan using t_fl_idx on t (cost=0.00..3.74 rows=2 width=8)
(actual time=0.007..0.011 rows=2 loops=1)
Index Cond: ((few = 2) AND (lots = 2345))
Buffers: local hit=4
Total runtime: 0.027 ms
Best Answer
If the alternative is to have a data structure that efficiently filters by tenant, then no.
If TenantId is the leading column in your clustered index, then single-tenant queries will be pretty fast. The NCCI row groups will not be segregated by tenant, and so all the row groups will need to be scanned to find rows for a single tenant.
As rows are inserted, they are inserted into the "middle" of the clustered index, typically at the "end" of the rows for the tenant. But rows are always inserted into the delta store for an CCI/NCCI. And whenever the delta store has 1 million rows in it, it's rebuilt into a columnar row group. So each row group will end up with the last million rows inserted across all tenants.
You can fix this with partitioning by TenantID, which could give each tenant (or group of tenants) a separate physical table (or NCCI). Each partition will have its own delta store, which will fill up with single-tenant rows. If you do partition, each partition might end up too small for a Columnstore to be useful. You'll want a few million rows per partition at least.
In general with multi-tenant data, you want to avoid interleaving tenant data in a way that requires you to read all tenants data to retrieve a single tenant's data.