Sql-server – Query runs slowly when a non-indexed column is added to the WHERE clause

azure-sql-databasesql server

My SQL Server houses a logs table that has 4 columns:

  • RowID – GUID PRIMARY KEY
  • OperationID – GUID (NONUNIQUE INDEXED)
  • Line – INT
  • Message VARCHAR(250)

Some table facts:

  • An operation may have anywhere between 1 and 380,000 messages
  • operationID is the same for all log lines that pertain to a particular operation
  • Line increments from 1 for each operationID (hence the combination of operationID and Line is unique- I don't know why the original table designer added a GUID PK)
  • The message varies considerably
  • A nonclustered nonunique index is present on OperationID. It INCLUDEs no other columns
  • The table has about 13.5 million rows

I find the following query reasonably performant, it returns 7000 rows in 5 seconds or less. The query plan shows index seek for all rows matching the operationID, key lookup on the primary key and then nested loops to produce the resultset:

SELECT *  
FROM logs 
WHERE operationid = <some guid>

Side info: SQL Server Management Studio complains that the query would be better served if an index were created on OperationID that INCLUDEd line and message.

If I add something that I consider trivial:

SELECT * 
FROM logs 
WHERE operationid = <some guid> AND Line > 2000

Suddenly the query takes well over 3 minutes to return a result. My immediate thought was to look at the execution plan differences. The only significant difference I find is that in addition to the "seek predicate" that is associating the table rows with index rows, there is a "predicate" section in the tooltip of the table for the nonperformant query. The "predicate" is listed as "line > 2000".

Side info: SQL Server has switched to complaining that the query would be better served by indexing OperationID and Line and INCLUDEing Message.

Is SQL Server doing things in the wrong order? i.e. is it looking through 13.5 million rows for all those whose Line > 2000, and then filtering and joining to the index?

As a human I can see this query should be sub-5-second, because filtering all the rows based on the index first (13.5M -> 7K) and then filtering 7K rows for the matching Line would be a better strategy, than finding potentially millions of rows with line > 2000. I was expecting that filtering any large table on an indexed column is something SQL Server would prefer over starting with non indexed columns in a where clause?

I don't really need an index on Line, as we don't select entries based on line ranges in common use – this is just for code development, I know for this particular operation I'm analysing now that all line entries before 2000 are uninteresting. I'm more interested as to why a query WHERE indexedcol = value is super fast but WHERE indexedcol = value and unindexedcol = othervalue runs like a dog. And by that I don't mean a greyhound.

Best Answer

You're getting a secondary predicate on the key lookup operation. That's going to slow stuff down. It's not doing a full scan, but remember that even seeks are just limited scans. So the key lookup is still doing what it did before, but now it's also adding a filtering step because of that added index.

You could look at exploring a covering index, however, because you have a SELECT *, covering means all columns, effectively creating a second clustered index. There'll be some sacrifice of performance on data modification queries and on disk space (depending on where you are on Azure SQL Database, that could bump you to another tier).

Instead, I'd start with experimenting with your existing nonclustered index. See if adding this new column to the key of that index helps at all. Keep it as the second column, other wise you're changing your histogram which could cause other issues.

You're going to have to experiment to see for sure what works best in your situation.