SQL Server – Indexed VIEW with PIVOT for Performance

performancepivotsql serverview

A simplified version of my problem is as follows:

I have three tables Records, Inputs, and Outputs.

Records


Id


DTS  






2015-01-01 00:00


Inputs


Id


RecordId


InputId


Value






1  


1  


100  








1  


1  


105  



Outputs


Id


RecordId


OutputId


Value






1  


1  


200  








1  


1  


10  



This fragmentation was done to avoid limiting the total number of inputs or outputs due to constraints on the number of columns in a table. I am looking for the best (fastest) way to run queries against something similar to the following:


DTS  


Input1


...


InputN


Output1


...


OutputM



2015-01-01 00:00


100  


...


105  


200  


...


10  


My initial plan was to create a view based on pivoting the requisite inputs and outputs and joining them appropriately. The data only needs to be readonly and is essentially static, so I want to prevent it from being expanded for each query. I read that I could add an index to the view to 'materialize' it, but this can't be done if the view uses PIVOT.

Does anyone have any advice on how I might accomplish this? Or if I am approaching this the wrong way, I would happily take suggestions.

Best Answer

Given that the data is changing very infrequently and there will therefore be very little overhead in maintaining the indexed views, an approach using indexed views could help performance. As you likely noticed when trying to create an indexed view that uses PIVOT, there are a lot of restrictions on what syntactic constructs can be used in indexed views.

To get around this limitation, you can use a combination of aggregation (via SUM) and CASE statements. However, you'll then run into another limitation, which is that a view that uses UNION cannot be materialized. However, it is easy to make a separate view for the aggregated inputs and outputs in this case, and that approach is worth testing on your specific workload to see if it yields a performance benefit. Below is a sample script that illustrates both approaches:

-- Set up test data with 3 records, and each record has 2 inputs and 3 outputs
CREATE TABLE dbo.records (Id INT IDENTITY(1,1) NOT NULL, DTS DATETIME NOT NULL)
GO
INSERT INTO dbo.records (DTS)
VALUES  (GETDATE()-2),
        (GETDATE()-1),
        (GETDATE())
GO
CREATE TABLE dbo.inputs (Id INT IDENTITY NOT NULL, recordId INT NOT NULL, inputId INT NOT NULL, value FLOAT NOT NULL)
GO
INSERT INTO dbo.inputs (recordId, inputId, value)
SELECT r.Id AS recordId, i.inputId, r.Id * i.inputId AS value
FROM dbo.records r
CROSS JOIN (VALUES (1), (2)) AS i(inputId)
GO
CREATE TABLE dbo.outputs (Id INT IDENTITY NOT NULL, recordId INT NOT NULL, outputId INT NOT NULL, value FLOAT NOT NULL)
GO
INSERT INTO dbo.outputs (recordId, outputId, value)
SELECT r.Id AS recordId, i.outputId, -1 * r.Id * i.outputId AS value
FROM dbo.records r
CROSS JOIN (VALUES (1), (2), (3)) AS i(outputId)
GO

-- Create the indexed view for inputs
CREATE VIEW dbo.recordInputs WITH SCHEMABINDING AS
SELECT r.Id AS recordId,
    r.DTS,
    -- A COUNT_BIG() is mandatory to create an indexed view
    COUNT_BIG(*) AS numInputs,
    -- Generate the columns input1, ..., inputM
    SUM(CASE WHEN i.inputId = 1 THEN i.value ELSE 0 END) AS Input1,
    SUM(CASE WHEN i.inputId = 2 THEN i.value ELSE 0 END)Input2
FROM dbo.records r
JOIN dbo.inputs i
    ON i.recordId = r.Id
GROUP BY r.Id, r.DTS
GO
CREATE UNIQUE CLUSTERED INDEX UQ_recordInputs ON dbo.recordInputs (recordId)
GO

-- Create the indexed view for outputs
CREATE VIEW dbo.recordOutputs WITH SCHEMABINDING AS
SELECT r.Id AS recordId,
    r.DTS,
    -- A COUNT_BIG() is mandatory to create an indexed view
    COUNT_BIG(*) AS numRows,
    -- Generate the columns output1, ..., outputM
    SUM(CASE WHEN o.outputId = 1 THEN o.value ELSE 0 END) AS Output1,
    SUM(CASE WHEN o.outputId = 2 THEN o.value ELSE 0 END) AS Output2,
    SUM(CASE WHEN o.outputId = 3 THEN o.value ELSE 0 END) AS Output3
FROM dbo.records r
JOIN dbo.outputs o
    ON o.recordId = r.Id
GROUP BY r.Id, r.DTS
GO
CREATE UNIQUE CLUSTERED INDEX UQ_recordOutputs ON dbo.recordOutputs (recordId)
GO

-- Create the overall view
CREATE VIEW dbo.recordInputsAndOutputs WITH SCHEMABINDING AS
SELECT i.recordId, i.DTS, i.input1, i.input2, o.output1, o.output2, o.output3
FROM dbo.recordInputs i WITH (NOEXPAND) /* In non-enterprise versions of SQL, you will likely need this hint to avoid expanding the indexed view */
JOIN dbo.recordOutputs o WITH (NOEXPAND)
    ON o.recordId = i.recordId
GO

-- Test the view (turn on "Include Actual Execution Plan" first in SSMS to confirm usage of the indexed views)
SELECT * 
FROM dbo.recordInputsAndOutputs 
GO

-- Cleanup
DROP VIEW dbo.recordInputsAndOutputs
DROP VIEW dbo.recordInputs
DROP VIEW dbo.recordOutputs
DROP TABLE dbo.inputs
DROP TABLE dbo.outputs
DROP TABLE dbo.records
GO

One caveat to point out is that these indexed views aren't really doing any aggregation. We still have the same number of data elements as before, but they are simply pivoted into a smaller number of rows (and larger number of columns). Even so, I have often observed significant performance gains from operating on a pivoted row set that contains fewer rows to represent the same data. I'll defer to deeper experts on why this might be, but it is fairly intuitive given the row-based processing model where SQL Server needs to pull each row through each query plan operator, presumably incurring some overhead for each row while doing so (at least for row-mode execution, which covers everything except parallel columnstore plans).

One last note is that you will often get better and faster responses if you publish a snippet of SQL code that generates some basic test data in your schema (similar to the "set up test data" section of the script above) in addition to just showing the schema. This makes it much easier for someone to help you out and lowers the possibility of mis-interpreting the question.