Sql-server – need suggestions to improve view performance

database-designperformancequery-performancesql serversql-server-2008view

I am looking to improve the performance of a view in SQL Server 2008. This view exists in a reporting database that is widely used by not-very-technical-folk to basically denormalize all of these attributes of a person.

It's a really complicated, long running view. We have over 19 million people and there is a lot of logic that goes into each column. For example, there is an indicator as to whether or not a person is deceased and that relies on three CTEs (common table expressions) and a case statement.

Basically, it's a nightmare.

I need to figure out a way to improve the performance. Changing it to a table is not possible – data must be up-to-the-second accurate. Changing it to an indexed view is right out – it uses data from multiple databases. I can't really modify the column structure as it would break a number of existing reports.

Are there any tools in the toolbox that might help? I'm wondering if stored procedures or functions might help. Maybe a table with computed columns? I would be able to pull the person's identifying information on a nightly basis and store that into a table, but the vast majority of columns rely on live data.

Best Answer

Views are typically not implemented for performance. And while you currently cannot implement explicit indexed views (which are just views that SQL Server maintains for you), you can certainly maintain facts manually yourself.

For example, you mention that you currently calculate "whether someone is dead or not" using three CTEs and a CASE expression (sorry, to be pedantic, it's not a statement).

Instead of referencing this set of CTEs every time the view is accessed, why not put that fact in a table (potentially along with other facts that have to be calculated per user), and calculate that periodically in the background? So maybe every 5 minutes (that is just a SWAG, you'll have to determine what's appropriate), you run a SQL Server Agent job that re-populates the table based on what it currently knows is the truth. Now the view just has to reference the table that is the output of that script, instead of calculating it over and over again while the users wait. So for example:

CREATE TABLE dbo.PersonProperties
(
  PersonID INT PRIMARY KEY REFERENCES dbo.Persons(PersonID),
  IsDead BIT NOT NULL DEFAULT 0
);

Now the job can simply merge that table with the results of the CTE, and then the view can include a reference to that table which simply pulls the BIT column along with a join on the PK. This should be MUCH less expensive at query time that re-evaluating all of that logic every time.

To minimize blocking (e.g. when users are accessing the view at the same time the job is running), you can implement what I call "schema switch-a-roo" and which I blogged about here:

http://www.sqlperformance.com/2012/08/t-sql-queries/t-sql-tuesday-schema-switch-a-roo

So instead of locking resources on the expensive query throughout the operation, the only blocking that happens is when the metadata switch actually takes place.

This works as long as you can afford some brief periods where the data is not accurate. You can tighten up that window to be pretty narrow, but there is always a chance that a person will die in between and for a brief moment a query will return that they are still alive. If you can't afford this, then you make it a part of the process that first introduces that fact to the database to make sure the CTE reflects that immediately and the new table also reflects it immediately.

Still not good enough? The flag a user as "dirty" the second a change for them comes in. The view can union or left join with the stale data for users that are "clean" and go after live data only for the users that are "dirty."

Best Answer

Related Solutions

Sql-server – how can I ignore duplicate inserts

SQL Server Performance – How to Split a Large Table to Improve Performance

Related Question