Sql-server – View columns elimination

execution-planoptimizationsql serversql-server-2012

I'm a little bit confused about how query optimizer eliminates unnecessary columns from view's query when you need to select only specific one.

Here is my view:

CREATE VIEW Schema1.Object1
AS
  SELECT Object2.Column1  AS Column2,
         Object2.Column3  AS Column4,
         Object3.Column5,
         Object3.Column6,
         Object4.Column1  AS Column7,
         Object5.Column1  AS Column8,
         Object5.Column9  AS Column10,
         Object5.Column11 AS Column12,
         Object5.Column13 AS Column14,
         Object5.Column15 AS Column16,
         Object5.Column17 AS Column18,
         Object5.Column19 AS Column20,
         Object5.Column21 AS Column22,
         Object5.Column23 AS Column24,
         CASE
           WHEN Object2.Column25 >= Object5.Column25
             THEN Object2.Column25
           ELSE Object5.Column25
         END              AS Column25,
         Object2.Column26
  FROM   Schema1.Object6 AS Object2
         CROSS JOIN Schema2.Object7 AS Object4
         JOIN Schema1.Object8 AS Object3
           ON Object3.Column5 = Object2.Column5
              AND Object3.Column7 = Object4.Column1
         LEFT JOIN Schema1.Object9 AS Object5
           ON Object5.Column4 = Object2.Column3
         JOIN Schema3.Object10 AS Object11
           ON Object5.Column26 = Object11.Column27

Now, I just want to do a query like:

SELECT Column2 
FROM Schema1.Object1

…but the estimated plan includes all columns from a view even though the query doesn't return them.

Why has this happened? Can I avoid this?

Here is a link for query plan: https://1drv.ms/u/s!AhdjYi359YDTgYF6sLo8fBsj5H6Ilg

also available at https://www.brentozar.com/pastetheplan/?id=S1MhIB3Ff

Best Answer

Each node in the plan only projects the columns minimally needed to satisfy the query correctly. You can see this by looking at the Output List of each operator. For example, the final join only lists one column. SQL Server is very good at removing unneeded projections.

I think perhaps you are expecting one or more of the joins to be eliminated. This is more tricky, because the optimizer has to be careful to not change the meaning of the query. There are four reasons to keep a join as Rob Farley describes in JOIN simplification in SQL Server:

Extra columns. The join is needed to provide columns needed by the query, either because the column appears in the final result, or it is needed for some intermediate step (like filtering, or a different join).
Row duplication. A join can increase the number of rows matched. For example a single row in table A might join with two rows in table B, so the result would contain two copies of the information in the table A row.
Row elimination. An inner join can eliminate rows from table A that do not join with any row in table B.
Added NULLs. A right or full join can introduce new NULLs where a row in table A does not match a row in table B.

SQL Server is quite good (though not perfect) at removing unnecessary joins where it is safe to do so. A join can only be removed if the optimizer has a guarantee that none of the four join effects above will affect the result.

In your case, it is likely that item #2 and #3 above applies. You may be able to make the view more simplifiable by using left joins instead of inner joins, and perhaps adding a DISTINCT to your outer query. See Rob's article for examples.

Related Solutions

Sql-server – Query slower after upgrade from sql server 2008R2 to 2014sp2

I understand your disappointment with the query plan regressions that you experienced. However, Microsoft changed some core assumptions about the cardinality estimator. They could not avoid some query plan regression. To quote Juergen Thomas:

However to state it pretty clearly as well, it was NOT a goal to avoid any regressions compared to the existing CE. The new SQL Server 2014 CE is NOT integrated following the principals of QFEs. This means our expectation is that the new SQL Server 2014 CE will create better plans for many queries, especially complex queries, but will also result in worse plans for some queries than the old CE resulted in.

To answer your first question, the optimizer appears to pick a worse plan with the new CE because of a 1 row cardinality estimate from Object2. This makes a nested loop join very attractive to the optimizer. However, the actual number of rows returned from Object2 was 34182. This means that the estimated cost for the nested loop plan was an underestimate by about 30000X.

The legacy CE gives a 208.733 cardinality estimate from Object2. This is still very far off, but it's enough to give a plan that uses a merge join a lower estimated cost than a nested loop join plan. SQL Server gave the nonclustered index seek on Object3 a cost of 0.0032831. With a nested loop plan under the legacy CE, we could expect a total cost for 208 index seeks to be about 0.0032831 * 208.733 = 0.68529 which is much higher than the final estimated subtree cost for the merge join plan, 0.0171922.

To answer your second question, the cardinality estimate formulas for a query as simple as yours are actually published by Microsoft. I recommend referencing the excellent white paper on differences between the legacy and new CE found here. Focus on why the cardinality estimates are 1 for the new CE and 208.733 for the legacy CE. That's unexpected because the legacy CE assumes independence of filters but the new CE uses exponential backoff. In general for such a query I would expect the new CE to give a larger cardinality estimate for Object2. You should be able to figure what's going on by looking at the statistics on Object2.

To answer your third question, we can get general strategies from the white paper. The following is an abbreviated quote:

Retain the new CE setting if specific queries still benefit, and “design around” performance issues using alternative methods.

Retain the new CE, and use trace flag 9481 for those queries that had performance degradations directly caused by the new CE.

Revert to an older database compatibility level, and use trace flag 2312 for queries that had performance improvements using the new CE.

Use fundamental cardinality estimate skew troubleshooting methods.

Revert to the legacy CE entirely.

For your problem in particular, first I would focus on the statistics. It's not clear to me why the cardinality estimate for an index scan on Object3 would be so far off. I recommend updating statistics with FULLSCAN on all of the involved objects and indexes before doing more tests. Updating the statistics again after changing the CE is also a good step. You should be able to use the white paper to figure out exactly why you're seeing the cardinality estimates that you're seeing.

I can give you more detailed help if you provide more information. I understand wanting to protect your IP but what you have there is a pretty simple query. Can you change the table and column names and provide the exact query text, relevant table DDL, index DDL, and information about the statistics?

If all else fails and you need to fix it without hints or trace flags, you could try updating to SQL Server 2016 or changing the indexes on your tables. You're unlikely to get the bad nested loop plan if you remove the index, but of course removing an index could affect other queries in a negative way.

Sql-server – SSIS OLE DB Source Editor Data Access Mode: “SQL command” vs “Table or view”

I will make a small experiment. i will use SQL profiler to see what is going in the background while using an OLEDB Source in the two cases:

I have a Table named dbo.Table_1 that contains 3 columns (ID,name,department)

I used SQL profiler to Tune the database containing this Table and i used the 2 access mode, below the results:

Table or View - selecting only ID column

The profiler shows that the following command is executed

SELECT * FROM [dbo].[Table_1]

Even if you only select one column, the OLEDB Source reads all data then filters columns after reading them all.

SQL COMMAND

The profiler shows that the following command is executed

SELECT [ID] FROM [dbo].[Table_1]

Recently I published an article that contains more details, you can check it on the following link:

SQLShack - SSIS OLE DB Source: SQL Command vs Table or View

Best Answer

Related Solutions

Sql-server – Query slower after upgrade from sql server 2008R2 to 2014sp2

Sql-server – SSIS OLE DB Source Editor Data Access Mode: “SQL command” vs “Table or view”

Related Question