Sql-server – Covering Index Question

indexindex-tuningperformancequery-performancesql servert-sql

I have the following query

Select  Pt.PRODID, PT.INVENTREFID,  Inv.ItemName, PT.ItemID, Configid,  Pt.QTYSCHED,
PT.DLVDATE,  Pt.CREATEDDATETIME, pt.SCHEDEND,
 CASE Left(PT.INVENTREFID, 3)
     WHEN 'SJB' THEN ST.SALESNAME
     WHEN 'WJB' THEN 'Sub - Assembly'
     ELSE 'Stock'
  END as CustomerName
from Dynamicsv5Realtime.dbo.PRODTABLE PT
Join Dynamicsv5Realtime.dbo.Inventdim ID
On PT.InventdimId = ID.InventdimID
and PT.Dataareaid = ID.dataareaid
Join Dynamicsv5Realtime.dbo.INVENTTABLE Inv
On Inv.itemid = PT.ItemId
Left Join Dynamicsv5Realtime.dbo.SALESTABLE ST
ON ST.SalesId = PT.INVENTREFID
and ST.Dataareaid = PT.dataareaid
where pt.PRODSTATUS in(2,3,4)
and PT.DATAAREAID = 'AJB'
Order by 7

when I was looking to improve it I came up with the following index

CREATE NONCLUSTERED INDEX [INDEX_2]
ON [dbo].[PRODTABLE] ([INVENTREFID],[DATAAREAID],[INVENTDIMID],[PRODSTATUS])
INCLUDE ([PRODID],[ItemID],[QTYSCHED],[DLVDATE],[CREATEDDATETIME],[SCHEDEND])

However this was not as efficient as the following which the missing index DMV came up with

CREATE NONCLUSTERED INDEX [INDEX_1]
ON [dbo].[PRODTABLE] ([DATAAREAID],[PRODSTATUS])
INCLUDE ([ITEMID],[QTYSCHED],[DLVDATE],[SCHEDEND],[INVENTREFID],[PRODID],[INVENTDIMID],    [CREATEDDATETIME])

What I am wondering is when is it more efficient to put columns in the index itself rather than in the Include part of the index?

May thanks

Best Answer

Very generally and non-technically, you should consider using the column as part of the index key if you are going to use that column in the WHERE clause (or otherwise filter/seek), and INCLUDE if it is merely being used to avoid a lookup (I like to call this "coming along for the ride"). Of course there are always exceptions:

you can't always include columns in the key due to size;
it may not matter due to cardinality; or,
it might not affect the plan due to where you would have to put it in the key to allow the index to continue satisfying other query semantics.

There are also cases where an INCLUDEd column can satisfy a filter just fine, and plenty of cases where changing the index might help this specific query but it might wreak havoc on the rest of your workload. You should always test index changes against an entire workload representing as much of a full business cycle as possible, rather than rely on the missing indexes DMV or your own analysis which, it seems, is focused on a single query.

I'm sure Paul White will come along with a much more elaborate and technically accurate response.

Related Solutions

Sql-server – computed columns, index, clustered index and covering index

Computed Columns can be stored in the data page in one of two ways. Either by creating them as PERSISTED or by including them in the clustered index definition.

If they are included in the clustered index definition then even if the columns are not marked as PERSISTED then the values will still be stored in each row. These index key values will additionally be stored in the upper level pages.

If the computed column is imprecise (e.g. float) or not verifiable as deterministic (e.g. CLR functions) then it is a requirement for the column to be marked as PERSISTED in order to be made part of an index key.

So to give an example

CREATE TABLE T
(
A INT,
C1 AS REPLICATE(CHAR(A),100) PERSISTED,
C2 AS REPLICATE(CHAR(A),200),
C3 AS CAST(A AS FLOAT) PERSISTED,
C4 AS CAST(A + 1 AS FLOAT)
)

CREATE UNIQUE CLUSTERED INDEX IX ON T(C2,C3)

C1 will be stored in just the data page rows as it is marked as PERSISTED but not indexed.
C2 will be stored in both the rows on the data page and the index higher levels as it is an index key column.
C3 will be stored as for C2. As it is imprecise it is a requirement to mark it as PERSISTED however.
C4 won't be stored anywhere as it is neither marked as PERSISTED nor indexed.

Similarly all computed columns referenced in non clustered index definitions as key columns need to be stored at all levels of the index as they are part of the index key. There is the same requirement regarding precise/deterministic results.

Fails

CREATE NONCLUSTERED INDEX IX2 ON T(A,C4)

With the error.

Cannot create index or statistics 'IX2' on table 'T' because the computed column 'C4' is imprecise and not persisted. Consider removing column from index or statistics key or marking computed column persisted.

To include it as part of the non clustered index key it must also be stored in the clustered index data pages. However

Succeeds.

CREATE NONCLUSTERED INDEX IX2 ON T(A) INCLUDE (C4)

Computed columns that are only INCLUDEd columns are persisted to the NCI leaf page and do not have the requirement that they also be persisted in the data page.

Is the WHERE-JOIN-ORDER-(SELECT) Rule for Index Column Order Incorrect?

Is the WHERE-JOIN-ORDER-(SELECT) rule for index column order wrong?

At the least it is incomplete and potentially misleading advice (I didn't bother to read the whole article). If you're going to read stuff on the Internet (including this), you should adjust your amount of trust according to how well you already know and trust the author, but always then verify for yourself.

There are a number of "rules of thumb" for creating indexes, depending on the exact scenario, but none are really a good substitute for understanding the core issues for yourself. Read up on the implementation of indexes and execution plan operators in SQL Server, go through some exercises, and come to a good solid understanding of how indexes can be used to make execution plans more efficient. There is no effective shortcut to attaining this knowledge and experience.

In general, I can say that your indexes should most often have columns used for equality tests first, with any inequalities last, and/or provided by a filter on the index. This is not a complete statement, because indexes can also provide order, which may be more useful than seeking directly to one or more keys in some situations. For example, ordering can be used to avoid a sort, to reduce the cost of a physical join option like merge join, to enable a stream aggregate, find the first few qualifying rows quickly...and so on.

I'm being a little vague here, because selecting the ideal index(es) for a query depends on so many factors - this is a very broad topic.

Anyway, it is not unusual to find conflicting signals for the 'best' indexes in a query. For example, your join predicate would like rows ordered one way for a merge join, the group by would like rows sorted another way for a stream aggregate, and finding the qualifying rows using the where clause predicates would suggest other indexes.

The reason indexing is an art as well as science is that an ideal combination is not always logically possible. Choosing the best compromise indexes for the workload (not just a single query) requires analytic skills, experience, and system-specific knowledge. If it were easy, the automated tools would be perfect, and performance-tuning consultants would be much less in demand.

As far as missing index suggestions are concerned: these are opportunistic. The optimizer brings them to your attention when it tries to match predicates and required sort order to an index that does not exist. The suggestions are therefore based on particular matching attempts in the specific context of the particular sub-plan variation it was considering at the time.

In context, the suggestions always make sense, in terms of reducing the estimated cost of data access, according to the optimizer's model. It does not do a wider analysis of the query as a whole (much less the wider workload), so you should think of these suggestions as a gentle hint that a skilled person needs to look at the available indexes, with the suggestions as a starting point (and usually no more than that).

In your case, the (Status) INCLUDE (ID) suggestion probably came about when it was looking at the possibility of a hash or merge join (example later). In that narrow context, the suggestion makes sense. For the query as a whole, maybe not. The index (ID, Status) enables a nested loop join with ID as an outer reference: equality seek on ID and inequality on Status per iteration.

One possible selection of indexes is:

CREATE INDEX i1 ON dbo.I (ID, [Status]);
CREATE INDEX i1 ON dbo.IP (Deleted, OPID, IID) INCLUDE (Q);

...which produces a plan like:

I am not saying these indexes are optimal for you; they happen to work to produce a reasonable-looking plan to me, without being able to see statistics for the tables involved, or the full definitions and existing indexing. Also, I know nothing of the wider workload or real query.

Alternatively (just to show one of the myriad additional possibilities):

CREATE INDEX i1 ON dbo.I ([Status]) INCLUDE (ID);
CREATE INDEX i1 ON dbo.IP (Deleted, IID, OPID) INCLUDE (Q);

Gives:

Execution plans were generated using SQL Sentry Plan Explorer.

Best Answer

Related Solutions

Sql-server – computed columns, index, clustered index and covering index

Is the WHERE-JOIN-ORDER-(SELECT) Rule for Index Column Order Incorrect?

Related Question