Sql-server – SQL Server 2008 datetime index performance bug

sql serversql-server-2008

We are using SQL Server 2008 R2, and have a very large (100M+ rows) table with a primary id index, and a datetime column with a nonclustered index. We are seeing some highly unusual client/server behavior based upon the use of an order by clause specifically on a indexed datetime column.

I read through the following post:
https://stackoverflow.com/questions/1716798/sql-server-2008-ordering-by-datetime-is-too-slow
but there is more going on with the client/server than what is begin described here.

If we run the following query (edited to protect some content):

select * 
from [big table] 
where serial_number = [some number] 
order by test_date desc

The query times-out every time. In the SQL Server Profiler the executed query looks like this to the server:

exec sp_cursorprepexec @p1 output,@p2 output,NULL,N'select * .....

Now if you modify the query to, say this:

declare @temp int;
select * from [big table] 
where serial_number = [some number] 
order by test_date desc

The the SQL Server Profiler shows the executed query looks like this to the server, and it WORKS instantly:

exec sp_prepexec @p1 output, NULL, N'declare @temp int;select * from .....

As a matter of fact, you can even put an empty comment ('–;') instead of a unused declare statement and get the same result. So initially we were pointing to the sp pre-processor as the root cause of this issue, but if you do this:

select * 
from [big table] 
where serial_number = [some number] 
order by Cast(test_date as smalldatetime) desc

It works instantly as well (you can cast it as any other datetime type), returning the result in milliseconds. And the profiler shows the request to the server as:

exec sp_cursorprepexec @p1 output, @p2 output, NULL, N'select * from .....

So that somewhat excludes the sp_cursorprepexec procedure from the full cause of the issue. Add to this the fact that the sp_cursorprepexec is also called when no 'order by' is used and the result it also returned instantly.

We've googled around for this issue quite a bit, and I see similar issues posted from others, but none that break it down to this level.

So have others witnessed this behavior? Does anyone have a solution better than putting meaningless SQL in front of the select statement to change the behavior? Being that SQL Server should be invoking the order by after the data is collected, it sure seems like this is a bug in the server that has persisted for a long time. We have found this behavior to be consistent across many of our large tables, and is reproducible.

Edits:

I should also add putting a forceseek in also makes the problem disappear.

I should add to help the searchers, the ODBC timeout error thrown is: [Microsoft][ODBC SQL Server Driver]Operation canceled

Added 10/12/2012:
Still hunting down root cause, (along with having built a sample to give to Microsoft, I will cross post any results here after I submit). I have been digging into the ODBC trace file between a working query (with an added comment/declare statement) and non-working query. The fundamental trace difference is posted below. It occurs on the call to SQLExtendedFetch call after all of the SQLBindCol discussions are completed. The call fails with return code -1, and the parent thread then enters SQLCancel. Since we are able to produce this with both the Native Client and Legacy ODBC drivers, I'm still pointing to some compatibility issue on the server side.

(clip)
MSSQLODBCTester 1664-1718   EXIT  SQLBindCol  with return code 0 (SQL_SUCCESS)
        HSTMT               0x001EEA10
        UWORD                       16 
        SWORD                        1 <SQL_C_CHAR>
        PTR                0x03259030
        SQLLEN                    51
        SQLLEN *            0x0326B820 (0)

MSSQLODBCTester 1664-1718   ENTER SQLExtendedFetch 
        HSTMT               0x001EEA10
        UWORD                        1 <SQL_FETCH_NEXT>
        SQLLEN                     1
        SQLULEN *           0x032677C4
        UWORD *             0x032679B0

MSSQLODBCTester 1664-1fd0   ENTER SQLCancel 
        HSTMT               0x001EEA10

MSSQLODBCTester 1664-1718   EXIT  SQLExtendedFetch  with return code -1 (SQL_ERROR)
        HSTMT               0x001EEA10
        UWORD                        1 <SQL_FETCH_NEXT>
        SQLLEN                     1
        SQLULEN *           0x032677C4
        UWORD *             0x032679B0

        DIAG [S1008] [Microsoft][ODBC SQL Server Driver]Operation canceled (0) 

MSSQLODBCTester 1664-1fd0   EXIT  SQLCancel  with return code 0 (SQL_SUCCESS)
        HSTMT               0x001EEA10

MSSQLODBCTester 1664-1718   ENTER SQLErrorW 
        HENV                0x001E7238
        HDBC                0x001E7B30
        HSTMT               0x001EEA10
        WCHAR *             0x08BFFC5C
        SDWORD *            0x08BFFF08
        WCHAR *             0x08BFF85C 
        SWORD                      511 
        SWORD *             0x08BFFEE6

MSSQLODBCTester 1664-1718   EXIT  SQLErrorW  with return code 0 (SQL_SUCCESS)
        HENV                0x001E7238
        HDBC                0x001E7B30
        HSTMT               0x001EEA10
        WCHAR *             0x08BFFC5C [       5] "S1008"
        SDWORD *            0x08BFFF08 (0)
        WCHAR *             0x08BFF85C [      53] "[Microsoft][ODBC SQL Server Driver]Operation canceled"
        SWORD                      511 
        SWORD *             0x08BFFEE6 (53)

MSSQLODBCTester 1664-1718   ENTER SQLErrorW 
        HENV                0x001E7238
        HDBC                0x001E7B30
        HSTMT               0x001EEA10
        WCHAR *             0x08BFFC5C
        SDWORD *            0x08BFFF08
        WCHAR *             0x08BFF85C 
        SWORD                      511 
        SWORD *             0x08BFFEE6

MSSQLODBCTester 1664-1718   EXIT  SQLErrorW  with return code 100 (SQL_NO_DATA_FOUND)
        HENV                0x001E7238
        HDBC                0x001E7B30
        HSTMT               0x001EEA10
        WCHAR *             0x08BFFC5C
        SDWORD *            0x08BFFF08
        WCHAR *             0x08BFF85C 
        SWORD                      511 
        SWORD *             0x08BFFEE6
(clip)

Added a Microsoft Connect case 10/12/2012:

https://connect.microsoft.com/SQLServer/feedback/details/767196/order-by-datetime-in-odbc-fails-for-clean-sql-statements#details

I should also note that we did look up the query plans for both the functioning, and non-functioning queries. They are both reused appropriately based upon the execution count. Flushing the cached plans and re-running does not change the success of the query.

Best Answer

There's no mystery, you get a good(er) or (really) bad plan at basically random because there is no clear cut choice for the index to use. While compelling for the ORDER BY clause and thus avoid the sort, you non-clustered index on the datetime column is a very poor choice for this query. What would make a much better index for this query would be one on (serial_number, test_date). Even better, this would make a very good candidate for a clustered index key.

As a rule of thumb time series should be clustered by the time column, because the overwhelming majority of requests are interested in specific time ranges. If the data is also inherently partitioned on a column with low selectivity, like it seems to be the case with your serial_number, then this column should be added as the leftmost one in the clustered key definition.