SQL Server – Using Indexed Views for Aggregates: Too Good to Be True?

indexmaterialized-viewperformancequery-performancesql server

We have a data warehouse with a fairly large record count (10-20 million rows) and often run queries that count records between certain dates, or count records with certain flags, e.g.

SELECT
    f.IsFoo,
    COUNT(*) AS WidgetCount
FROM Widgets AS w
JOIN Flags AS f
    ON f.FlagId = w.FlagId
WHERE w.Date >= @startDate
GROUP BY f.IsFoo

The performance isn't awful, but can be relatively sluggish (perhaps 10 seconds on a cold cache).

Recently I discovered that I can use GROUP BY in indexed views and so tried out something similar to the following

CREATE VIEW TestView
WITH SCHEMABINDING
AS
    SELECT
        Date,
        FlagId,
        COUNT_BIG(*) AS WidgetCount
    FROM Widgets
    GROUP BY Date, FlagId;
GO

CREATE UNIQUE CLUSTERED INDEX PK_TestView ON TestView
(
    Date,
    FlagId
);

As a result the performance of my first query is now < 100ms, and the resulting view & index is < 100k (although our row count is large, the range of dates and flag IDs means that this view only contains 1000-2000 rows).

I thought that perhaps this would criple the performance of writes to the Widget table, but no – the performance of inserts and updates into this table is pretty much unaffected as far as I could tell (plus, being a data warehouse this table is updated infrequently anyway)

To me, this seems way too good to be true – is it? What do I need to be careful with when using indexed views in this way?

Best Answer

As you've noted, the view itself only materializes a small number of rows - so even if you update the whole table, the additional I/O involved with updating the view is negligible. You probably already felt the biggest pain you're going to feel when you created the view. The next closest will be if you add a gazillion rows to the base table with a bunch of new IDs that require new rows in the view.

This isn't too good to be true. You're using indexed views exactly how they were meant to be used - or at least one of the most effective ways: to pay for future query aggregations at write time. This works best when the result is much smaller than the source and of course when the aggregations are requested more often than the underlying data is updated (more common in DW than OLTP, generally).

Unfortunately many people think indexing a view is magic - an index won't make all views more efficient, especially views that simply join tables and/or produce the same number of rows as the source (or even multiply). In these cases the I/O from the view is the same or even worse than the original query, not only because there are the same or more rows, but often they are storing and materializing more columns, too. So materializing those in advance doesn't provide any gains, since - even with SSDs - I/O, network, and client processing/rendering still remain the primary bottlenecks in returning large resultsets to the client. The savings you get in avoiding the join at runtime just aren't measurable in comparison to all the other resources you're still using.

Like non-clustered indexes, just be careful to not over-do it. If you add 10 different indexed views to one table, you're going to see more impact to the write portion of your workload, especially if the grouping column(s) are not (in) the clustering key.

Gosh, I've been meaning to blog about this topic.