SQL Server – How Deep is Too Much for Nested Views?

performanceview

I stumbled on a problem using views. The content management system my program interfaces with, defines a view which calls another view on the system. That is 2 levels to begin with. Before I stumble on views, I thought nested views were casual, but after my code blew up and i googled around, i begin to pick up words like "sin" and "vulgar" and "bad" which are associated with nested views.

Well I have two-level nested views to begin with. In my application level, I define my 3rd level and that's when it gives me time out error. Is 3 level too much? The tables are not terribly big — between 5,000 and 100,000 records, while my view itself only produces 2,000 record. Is this suppose to break the camel's back?

Best Answer

It's not the number of levels of views that is the problem, it is the complexity of the resulting query. Having a lot of nested views may contribute to the complexity of the query but a great deal depends on what those views are actually doing.

Having several layers of nested views often creates problems when it means that you are doing unnecessary or duplicative work. If an underlying view, for example, joins tables A, B, and C when you only need information from A and B, the join to C means that the database may have to do unnecessary work (depending on the database engine and the exact definition, the optimizer may be able to work out that the join to C is unnecessary and eliminate it but this doesn't usually happen). If a first level view joins A and B and then a second level view joins to B again because it wants to add some additional columns from B, you've now forced the database to do duplicate work and join to B twice. The same thing happens when you have predicates in different levels of views that force the database to do work that it doesn't have to do.

Broadly, if you have a well-designed series of views where each level has some defined responsibility, you're much more likely to have reasonable performance than if you have an incoherent set of views that have grown up organically over time with different developers adding things to different views to meet that day's requirements. Imagine you have a bug tracking system with the following objects

  • A bug table with all issues,
  • A vw_bug view on top of that that acts as an abstraction layer in case someone wants to refactor the bug table
  • A vw_current_bug view on top of vw_bug that that filters out soft-deleted rows
  • A vw_open_bug view that filters down to open bugs
  • A vw_bug_overview view that joins vw_open_bug to all the lookup tables to present information for the overview page

That's four layers of views with each one providing a well-defined abstraction layer. That's likely to be OK from a performance standpoint.

Having several levels of views can also create problems for the database when it makes it difficult or impossible for the optimizer to figure out the best place to apply a predicate (Google "predicate pushing" for tons of additional information on this). If you have a number of different levels of views but you always specify a where table_a_primary_key = :1 predicate when you query the outer view, the optimizer has to figure out when to apply that predicate. If table_a is the driving table in the inner view, ideally the optimizer figures out that it wants to push that predicate down through all the layers of views and apply it to that driving table before it does the work of the joins. Depending on the database engine, version, and the complexity of the code in your views, however, it may be difficult or impossible for the optimizer to figure out it wants to push the predicate or to push it there without theoretically changing the results. Different database engines and different versions of those engines will have very different behaviors in this respect.

Broadly speaking, this is no different than the challenges you'd have when you have a single view that happens to implement relatively complex logic. Different database engines have different ways to let you circumvent these sorts of issues-- Oracle hints can force the optimizer to push a predicate, Oracle lets you define a pipeline table function that acts like a view but lets you pass in a parameter that you apply where you want, SQL Server table functions act in much the same way, etc.

Those aren't the only two potential issues with having a bunch of levels of views but those are the two big ones in my experience. As with any query, the more complex it gets, the more difficult it is for the optimizer to exhaustively search every possible query plan and the more likely it is that you're going to end up with a less optimal plan. The more complex the query, the more likely you're going to hit a corner case where the optimizer can't correctly work out the cardinality of various steps because it doesn't know how to combine the statistics correctly.

On the other hand, a well-designed set of views can provide useful abstractions that make it easier to define queries correctly. In the bug tracking system from earlier, vw_current_bug provides a useful abstraction layer so that you don't have to filter out soft-deleted rows in hundreds of queries. There may be cases where this abstraction layer has a performance cost but that may be a small cost that you're happy to pay to avoid duplicating code.