A simplified version of my problem is as follows:
I have three tables Records, Inputs, and Outputs.
Records
Id
DTS
1
2015-01-01 00:00
…
Inputs
Id
RecordId
InputId
Value
1
1
1
100
…
N
1
1
105
…
Outputs
Id
RecordId
OutputId
Value
1
1
1
200
…
M
1
1
10
…
This fragmentation was done to avoid limiting the total number of inputs or outputs due to constraints on the number of columns in a table. I am looking for the best (fastest) way to run queries against something similar to the following:
DTS
Input1
...
InputN
Output1
...
OutputM
2015-01-01 00:00
100
...
105
200
...
10
…
My initial plan was to create a view based on pivoting the requisite inputs and outputs and joining them appropriately. The data only needs to be readonly and is essentially static, so I want to prevent it from being expanded for each query. I read that I could add an index to the view to 'materialize' it, but this can't be done if the view uses PIVOT.
Does anyone have any advice on how I might accomplish this? Or if I am approaching this the wrong way, I would happily take suggestions.
Best Answer
Given that the data is changing very infrequently and there will therefore be very little overhead in maintaining the indexed views, an approach using indexed views could help performance. As you likely noticed when trying to create an indexed view that uses
PIVOT
, there are a lot of restrictions on what syntactic constructs can be used in indexed views.To get around this limitation, you can use a combination of aggregation (via
SUM
) andCASE
statements. However, you'll then run into another limitation, which is that a view that usesUNION
cannot be materialized. However, it is easy to make a separate view for the aggregated inputs and outputs in this case, and that approach is worth testing on your specific workload to see if it yields a performance benefit. Below is a sample script that illustrates both approaches:One caveat to point out is that these indexed views aren't really doing any aggregation. We still have the same number of data elements as before, but they are simply pivoted into a smaller number of rows (and larger number of columns). Even so, I have often observed significant performance gains from operating on a pivoted row set that contains fewer rows to represent the same data. I'll defer to deeper experts on why this might be, but it is fairly intuitive given the row-based processing model where SQL Server needs to pull each row through each query plan operator, presumably incurring some overhead for each row while doing so (at least for row-mode execution, which covers everything except parallel columnstore plans).
One last note is that you will often get better and faster responses if you publish a snippet of SQL code that generates some basic test data in your schema (similar to the "set up test data" section of the script above) in addition to just showing the schema. This makes it much easier for someone to help you out and lowers the possibility of mis-interpreting the question.