Imagine the phone company has a list of phone numbers, including who the customer is, where they live, what their billing number is, and so on. The Primary Key could be the phone number.
They give you the White Pages. That's like a non-clustered index, which ordered by Name, including columns like the address.
If you want to find all the Farleys in the book, and are interested in their addresses, then the white pages is all you need. You can quickly Seek to the Farleys (finding the Fs, and so on), and then you have all the information you need.
But if you want their billing numbers, then you need to do a lookup. You can quickly find all the phone numbers of the Farleys, but then you need to take each of them (hundreds) and do another Seek (lookup) in the main (clustered) index, the one that's ordered by the phone number. Each of those is roughly the same cost as the seek to find the Farleys, making your query run orders of magnitude worse.
And there's a threshold. At some point, the database will realise that it's quicker just to go through every page of the clustered index, checking every record to see if it's of interest.
Seriously - get rid of Lookups. Your queries might be fast now, but probably won't scale.
From the background information you have provided it sounds like you have moved the contention point on your server from CPU to IO, as a result of implementing an indexing strategy targeted at specific troublesome queries.
For the most part (without getting into the specifics of what exactly your indexing strategy is) this is expected behaviour because with CPU no longer being a contention point, WAITS are naturally occurring elsewhere.
The performance issues you are seeing now are not necessarily caused by the index modifications you have applied. Further investigation into identifying the specific queries that are responsible for the IO waits/as well as the timeouts that you are seeing is required. You want to review the execution plans for each of these scenarios to identify possible areas for optimization (look for typically undesirable operations first such as implicit conversions, table scans etc.).
Not wanting to be presumptuous however I often encounter query timeouts on customer environments due to poor query plan choices as a result of parameter sniffing. This may also be something to look into.
For some handy queries to use during you investigations take a look at Glenn Berry's SQL Server Diagnostic Queries. There are a number of fantastic queries in there for looking at IO, for example which databases are responsible for the most IO, which database data files are seeing the most activity, which queries cause most reads/writes etc.
Best Answer
The
user_lookups
are lookups in to the index. The original scan or seek will have been on another non covering index.For example
Gives the following plan
Shows 1,000 seeks on the NCI and 1,000 lookups on the CI.
BTW: Just in case you were not aware the
user_lookups
shown here refers to the number of times a plan was executed containing the lookup operator not the number of lookups that actually occurred. e.g. Both of the following increment the counter by 1 despite performing 0 and 2,161 lookups in reality