Sql-server – Constructing datetime from 2 fields: one integer and a datetime

datetimesql serversql-server-2008

I am looking for the most efficient way to parse 2 values; the first is stored as a datetime field with (hh:mm:ss.fff all being 0) and the second time field stored as an int with the format: hhmmssmmm). I need to take these 2 fields and convert them to a datetime field. For example:

date field = '2013-09-04'
time field = 192423998

Should return:

2013-09-04 19:24:23.997

I've seen solutions using DATEADD and CONVERT and I came up with my own (shown below) but I wonder if there's a better/more efficient way.

DECLARE @dt DATETIME = '2013-09-04'

SELECT  @dt+' '+ STUFF(STUFF(STUFF(RIGHT('0'+CAST(192423998 AS VARCHAR(9)),9)  ,7,0,'.'),5,0,':'),3,0,':')

Note: In case you wonder why the RIGHT('0'+CAST(192423998 AS VARCHAR(9)),9) – this is because a time before 10 AM will only have one digit for the hour part.

Best Answer

So, you're going to be parsing strings and ints and doing all kinds of conversions, so I don't think you're going to really save anything by seeking out a "more efficient" way to turn crap data into good data - even on billions of rows. An alternative might be to avoid converting anything to a string at all (which is notoriously bad for performance):

DECLARE @dt DATETIME = '20130904', @t INT = 192423998;

SELECT DATEADD(MILLISECOND, @t % 100000 -- milliseconds
  + 60000 * (@t / 100000 % 100) -- minutes converted to milliseconds
  + 60000 * 60 * (@t / 10000000), -- hours converted to milliseconds
  @dt);

Of course, keep in mind that the milliseconds your trading system passes in can round in either direction when using datetime (.998 isn't possible and becomes .997, for example), so you may want instead:

SELECT DATEADD(MILLISECOND, @t % 100000 -- milliseconds
  + 60000 * (@t / 100000 % 100) -- minutes converted to milliseconds
  + 60000 * 60 * (@t / 10000000), -- hours converted to milliseconds
  CONVERT(DATETIME2(3), @dt));

I ran these each 100,000 times, and compared to your existing approach. The difference?

Your approach:               41.8 seconds
My approach (no convert):    40.5 seconds
My approach (with convert):  38.9 seconds

Now, this should not be extrapolated to mean that on 1 billion rows my method will save you (13,000 or 31,000) seconds, as the translation of 100,000 batches to operating on 100,000 rows in a single statement will not be linear. But it could be worth testing (you have the data necessary to test this; I would just be making stuff up, and I also probably have processors that have different performance characteristics when it comes to floating point operations etc., so anything else I could offer would not necessarily be relevant).

I still think you're better off taking the hit, fixing the design, and intercepting the incoming trading data and correcting it. As this system gets bigger, these translations and conversions are going to hurt more, and more, and more...

Related Solutions

Sql-server – How to avoid using variables in WHERE clause

Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.

The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

is a known value for @endDate and an unknown value for @startDate. That will leave SQL Server to guessing on 30% of the rows returned for the filter on @startDate combined with whatever the statistics tells it for @endDate. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.

Your wrapper procedure solution makes sure that SQL Server sees the values when DateRangeProc is compiled so it can use known values for both @endDate and @startDate.

Both your dynamic queries leads to the same thing, the values are known at compile-time.

The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for @endDate and null for @startDate. Using a null in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.

I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.

First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.

And I would think that DATEADD is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.
SQL Server estimates based on the value in the parameter regardless of what you do with DATEADD (tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on @endDate. Why it does that I don't know but it has to do with the use of the datatype DATE. Shift to DATETIME in the stored procedure and the table and the estimate will be accurate, meaning that DATEADD is considered at compile time for DATETIME not for DATE.

So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.

PS:

In comments you got two suggestions.

OPTION (OPTIMIZE FOR UNKNOWN) will give you an estimate of 9% of rows returned and OPTION (RECOMPILE) will make SQL Server see the parameter values since the query is recompiled every time.

Sql-server – Inserting date from string: CAST vs CONVERT

The (formerly) accepted answer iswas incorrect as it iswas a bad and misleading test. The two queries being compared do not do the same thing due to a simple typo that causes them to not be an apples-to-apples comparison. The test in the accepted answer is unfairly biased in favor of the CAST operation. The issue is that the CONVERT operation is being done with convert(date, GETDATE()+num,20) -- a value to convert that changes per row -- while the CAST operation is being done with a simple cast(GETDATE() as date) -- a value to convert that is consistent across all rows and is replaced in the execution plan as a constant. And in fact, looking at the XML execution plan even shows the actual operation performed as being CONVERT(date,getdate(),0) !!

Insofar as my testing shows (after making them equal via using cast(GETDATE()+num as date)), the times varry with them being mostly the same (which makes sense if they are both reduced to being CONVERT anyway) or the CONVERT winning:

SET STATISTICS IO, TIME ON;
    ;with t as (
               select convert(date, GETDATE(),20) as fecha , 0 as num
             union all
             select convert(date, GETDATE()+num,20) as fecha, num+1 from t where num<1000000)
    select max(fecha)
      from t
    option (maxrecursion  0);
SET STATISTICS IO, TIME OFF;

-- 4754-07-23
--Table 'Worktable'. Scan count 2, logical reads 6000008, physical reads 0, read-ahead reads 0

-- SQL Server Execution Times:
--   CPU time = 9031 ms,  elapsed time = 9377 ms.



-- VS    

SET STATISTICS IO, TIME ON;
    ;with t as (
               select cast(GETDATE() as date) as fecha , 0 as num
             union all
             select cast(GETDATE() as date) as fecha, num+1 from t where num<1000000)
    select max(fecha)
      from t
    option (maxrecursion  0);
SET STATISTICS IO, TIME OFF;

--2016-08-26
--Table 'Worktable'. Scan count 2, logical reads 6000008, physical reads 0, read-ahead reads 0

-- SQL Server Execution Times:
--   CPU time = 8969 ms,  elapsed time = 9302 ms.




SET STATISTICS IO, TIME ON;
    ;with t as (
               select cast(GETDATE() as date) as fecha , 0 as num
             union all
             select cast(GETDATE()+num as date) as fecha, num+1 from t where num<1000000)
    select max(fecha)
      from t
    option (maxrecursion  0);
SET STATISTICS IO, TIME OFF;

-- 4754-07-23
--Table 'Worktable'. Scan count 2, logical reads 6000008, physical reads 0, read-ahead reads 0

-- SQL Server Execution Times:
--   CPU time = 9438 ms,  elapsed time = 9878 ms.

The main difference between CAST and CONVERT is that CONVERT allows for the "style" to be specified. The "style" not only allows for tailoring the output when converting a non-string to a string, but also allows for specifying the input format when converting a string to a non-string:

SELECT CONVERT(DATE, '5/10/2016', 101); -- 101 = mm/dd/yyyy
-- 2016-05-10


SELECT CONVERT(DATE, '5/10/2016', 103); -- 103 = dd/mm/yyyy
-- 2016-10-05

Now compare that functionally with CAST:

SELECT CAST('13/5/2016' AS DATE);
-- Msg 241, Level 16, State 1, Line 71
-- Conversion failed when converting date and/or time from character string.


SELECT CONVERT(DATE, '13/5/2016', 101); -- 101 = mm/dd/yyyy
-- Msg 241, Level 16, State 1, Line 76
-- Conversion failed when converting date and/or time from character string.


SELECT CONVERT(DATE, '13/5/2016', 103); -- 103 = dd/mm/yyyy
-- 2016-05-13

One additional thing to mention about CAST: because it does not have the "style" parameter, the format of the date string passed in is assumed to be that of the current culture (a session property). The current culture is denoted by the @@LANGID and @@LANGUAGE system variables. This means that the CAST statement that failed in the test directly above could succeed for a different culture / language. The following tests shows this behavior and how that same date string does work with CAST when the current language is "French" (and would work with several others, based on the values in the dateformat column in sys.syslanguages):

IF (@@LANGID <> 0) -- us_english
BEGIN
    PRINT 'Changing LANGUAGE to English...';
    SET LANGUAGE ENGLISH;
    SELECT @@LANGUAGE AS [CurrentLanguage], @@LANGID AS [LangID];
END;

SELECT @@LANGUAGE, CAST('13/5/2016' AS DATE) AS [Test 1];
-- Msg 241, Level 16, State 1, Line 71
-- Conversion failed when converting date and/or time from character string.
GO

SELECT @@LANGUAGE, CONVERT(DATE, '13/5/2016', 103) AS [Test 2]; -- 103 = dd/mm/yyyy
-- us_english   2016-05-13
GO


IF (@@LANGID <> 2) -- Français
BEGIN
    PRINT 'Changing LANGUAGE to French...';
    SET LANGUAGE FRENCH;
    SELECT @@LANGUAGE AS [CurrentLanguage], @@LANGID AS [LangID];
END;

SELECT @@LANGUAGE, CAST('13/5/2016' AS DATE) AS [Test 3];
-- 2016-05-13
GO

SELECT @@LANGUAGE, CONVERT(DATE, '13/5/2016', 103) AS [Test 4]; -- 103 = dd/mm/yyyy
-- Français 2016-05-13
GO


-- Reset current language, if necessary.
IF (@@LANGID <> @@DEFAULT_LANGID)
BEGIN
    DECLARE @Language sysname;

    SELECT @Language = sl.[alias]
    FROM   sys.syslanguages sl
    WHERE  sl.[langid] = @@DEFAULT_LANGID;

    PRINT N'Changing LANGUAGE back to default: ' + @Language + N'...';

    SET LANGUAGE @Language;
    SELECT @@LANGUAGE AS [CurrentLanguage], @@LANGID AS [LangID];
END;

Best Answer

Related Solutions

Sql-server – How to avoid using variables in WHERE clause

Sql-server – Inserting date from string: CAST vs CONVERT

Related Question