There's no mystery, you get a good(er) or (really) bad plan at basically random because there is no clear cut choice for the index to use. While compelling for the ORDER BY clause and thus avoid the sort, you non-clustered index on the datetime column is a very poor choice for this query. What would make a much better index for this query would be one on (serial_number, test_date)
. Even better, this would make a very good candidate for a clustered index key.
As a rule of thumb time series should be clustered by the time column, because the overwhelming majority of requests are interested in specific time ranges. If the data is also inherently partitioned on a column with low selectivity, like it seems to be the case with your serial_number, then this column should be added as the leftmost one in the clustered key definition.
So, you're going to be parsing strings and ints and doing all kinds of conversions, so I don't think you're going to really save anything by seeking out a "more efficient" way to turn crap data into good data - even on billions of rows. An alternative might be to avoid converting anything to a string at all (which is notoriously bad for performance):
DECLARE @dt DATETIME = '20130904', @t INT = 192423998;
SELECT DATEADD(MILLISECOND, @t % 100000 -- milliseconds
+ 60000 * (@t / 100000 % 100) -- minutes converted to milliseconds
+ 60000 * 60 * (@t / 10000000), -- hours converted to milliseconds
@dt);
Of course, keep in mind that the milliseconds your trading system passes in can round in either direction when using datetime (.998 isn't possible and becomes .997, for example), so you may want instead:
SELECT DATEADD(MILLISECOND, @t % 100000 -- milliseconds
+ 60000 * (@t / 100000 % 100) -- minutes converted to milliseconds
+ 60000 * 60 * (@t / 10000000), -- hours converted to milliseconds
CONVERT(DATETIME2(3), @dt));
I ran these each 100,000 times, and compared to your existing approach. The difference?
Your approach: 41.8 seconds
My approach (no convert): 40.5 seconds
My approach (with convert): 38.9 seconds
Now, this should not be extrapolated to mean that on 1 billion rows my method will save you (13,000 or 31,000) seconds, as the translation of 100,000 batches to operating on 100,000 rows in a single statement will not be linear. But it could be worth testing (you have the data necessary to test this; I would just be making stuff up, and I also probably have processors that have different performance characteristics when it comes to floating point operations etc., so anything else I could offer would not necessarily be relevant).
I still think you're better off taking the hit, fixing the design, and intercepting the incoming trading data and correcting it. As this system gets bigger, these translations and conversions are going to hurt more, and more, and more...
Best Answer
Datetime
is not precise to the level of 1 millisecond. What you are asking for is not possible unless you change to a different datatype (i.e.datetime2
).Documentation
Important quote: