SQL Server 2012 – Index Optimization on a View Using Full Outer Join

indexnormalizationsql-server-2012view

Table structure:

Foo             FooChild                Bar
---             --------                ---
ID              ID                      ID
Date            FooID                   Date
GroupID         UserID                  UserID
Notes           Amount                  GroupID
                                        IsComplete

Foo has a unique index on Date+GroupID
FooChild has a FK to Foo, and a unique index on FooID+UserID, which includes Amount
Bar has a unique index on Date+UserID+GroupID, which includes IsComplete

Now I need to create a report showing the sum of all FooChild amounts alongside the count of complete Bars for any given date range. The users also want to be able to see the stats per group or per user. This would seem to be a great place to write a view:

create view vFooBar as
select f.Date, f.GroupID, fc.UserID, fc.Amount, b.IsComplete
from Foo f join FooChild fc on fc.FooID = f.ID
left join Bar b on f.Date = b.Date and f.GroupID = b.GroupID and fc.UserID = b.UserID
union
select b.Date, b.GroupID, b.UserID, x.Amount, b.IsComplete
from Bar b left join 
    (select f.Date, f.GroupID, fc.UserID, fc.Amount
    from Foo f join FooChild fc on fc.FooID = f.ID) x
on x.Date = b.Date and x.GroupID = b.GroupID and x.UserID = b.UserID

(Here's why I wrote the view this way.)

Now I can easily write queries like this:

select UserID, sum(Amount) FooAmount, sum(cast(IsCompleted as int)) CompletedBars
from vFooBar
where Date between @fromDate and @toDate
group by UserID

But there's a snag here. As soon as the date range starts getting relatively big, the execution plan goes all pear shaped. It uses the date index on Foo, but instead of then using the FooID index on FooChild, it does a clustered index scan, then a hash match on FooID to join with the results from Foo. And it does that twice in the overall plan; I'm guessing once for each aggregate. And that really hurts.

I understand that using the index I created on FooChild may not be efficient, since the values of FooID for a given date might be discrete, though typically they are inserted in roughly the same order.

I could denormalize, and add Date and GroupID to the FooChild table, then index those columns, and I'm pretty sure that would improve performance a lot. But it just doesn't feel right.

Any other ideas?

Best Answer

The optimizer makes choices based on costing estimates. The cost model is generic, and may not always choose optimal plans for your particular hardware, and its assumptions may not always be valid for your circumstances.

In this case, the optimizer assesses a hash join as the cheaper option over nested loops when the estimated number of rows to be joined is large. If you are certain that a nested loops join will always be preferable to a hash join, you could consider (and test!) forcing a seek instead of a scan of the FooChild table in the view:

SELECT 
    f.TheDate, 
    f.GroupID, 
    fc.UserID, 
    fc.Amount, 
    b.IsComplete
FROM dbo.Foo AS f 
JOIN dbo.FooChild AS fc WITH (FORCESEEK) -- New hint
    ON fc.FooID = f.ID
LEFT JOIN dbo.Bar AS b 
    ON f.TheDate = b.TheDate 
    AND f.GroupID = b.GroupID 
    AND fc.UserID = b.UserID
UNION
SELECT 
    b.TheDate, 
    b.GroupID, 
    b.UserID, 
    x.Amount, 
    b.IsComplete
FROM dbo.Bar AS b 
LEFT JOIN 
(
    SELECT 
        f.TheDate, 
        f.GroupID, 
        fc.UserID, 
        fc.Amount
    FROM dbo.Foo AS f 
    JOIN dbo.FooChild AS fc WITH (FORCESEEK) -- New hint
        ON fc.FooID = f.ID
) AS x
    ON x.TheDate = b.TheDate 
    AND x.GroupID = b.GroupID 
    AND x.UserID = b.UserID;

Side note: while this transformation from the original full join is valid given the current uniqueness constraints on your tables, please review the answer to your previous question and consider rewriting the full join as suggested in my edit.

SUGGESTION #1 : Don't use VIEWS

Views are notorious for acting up with Query Optimization

According to MySQL Documentation

Indexes can be used for views processed using the merge algorithm. However, a view that is processed with the temptable algorithm is unable to take advantage of indexes on its underlying tables (although indexes can be used during generation of the temporary tables).

Percona calls Views a Performance Troublemaker

SUGGESTION #2 : Use another index (Optional)

You definitely need an additional index to assist the query

ALTER TABLE taxes ADD INDEX propertysid_amount_ndx (propertysid,amount);
ALTER TABLE taxes DROP INDEX propertysid;

That way, all the data needed for the view are in the index only. The other two indexes are not enough. Why do I say that? Even though the propertysid index was used, the amount has to retrieved from the table. Essentially, the query passes through both the index and the table.

Sql-server – How much of a view is persisted when you create an index

All three columns are persisted to disk in the clustered index on the indexed view (no different, really, from a clustered index on a regular table). You can validate this using DBCC PAGE. I created the following structure in tempdb:

USE tempdb;
GO

CREATE TABLE dbo.a(a INT);
CREATE TABLE dbo.b(a INT, b DATETIME);
CREATE TABLE dbo.c(a INT, c CHAR(32));

INSERT dbo.a SELECT 1;
INSERT dbo.b SELECT 1, SYSDATETIME();
INSERT dbo.c SELECT 1, REPLICATE('c', 32);
GO

CREATE VIEW dbo.v
WITH SCHEMABINDING 
AS
  SELECT a.a, b.b, c.c
  FROM dbo.a
  INNER JOIN dbo.b
  ON a.a = b.a
  INNER JOIN dbo.c
  ON a.a = c.a;
GO

CREATE UNIQUE CLUSTERED INDEX x ON dbo.v(a);
GO

Turned a trace flag on so I can use IND/PAGE:

DBCC TRACEON(3604, -1);
DBCC IND('tempdb', 'dbo.v', 1);

Results:

enter image description here

So I knew to check page 312:

DBCC PAGE(2, 1, 312, 3);

And here are the three values from the join, even though they're not all part of the unique aspect of the clustered index:

enter image description here

Best Answer

Related Solutions

Mysql – Slow MySQL query, not sure why

SUGGESTION #1 : Don't use VIEWS

SUGGESTION #2 : Use another index (Optional)

Sql-server – How much of a view is persisted when you create an index

Related Question