You're not seeing the benefits of parallel execution because both insert methods use single-row inserts, hence parallel does not kick in !
Parallel execution is only available for bulk operations. Lots of tiny operations doesn't qualify as a set operation. You're inserting rows one by one here, you should try it with a bulk operation (INSERT /*+APPEND*/ INTO SELECT
, CREATE TABLE AS SELECT
...) and see if you can achieve some performance improvement.
You can tell when Oracle does a parallel execution because the explain plan will display parallel operations.
For further reading:
Using T-SQL scalar functions will frequently lead to performance problems* because SQL Server makes a separate function call (using a whole new T-SQL context) for each row. In addition, parallel execution is disallowed for the whole query.
T-SQL scalar functions can also make it difficult to troubleshoot performance problems (whether those problems are caused by the function or not). The function appears as a 'black box' to the query optimizer: it is assigned a fixed low estimated cost, regardless of the actual content of the function.
See this and this for more on the pitfalls of scalar functions.
You will probably be better off using the new TRY_CONVERT function in SQL Server 2012:
SELECT
InsertID,
dt1 = TRY_CONVERT(smalldatetime, MangledDateTime1),
dt2 = TRY_CONVERT(smalldatetime, MangledDateTime2),
dt3 = TRY_CONVERT(smalldatetime, MangledDateTime3)
FROM dbo.RawData;
╔══════════╦═════════════════════╦═════════════════════╦═════════════════════╗
║ InsertID ║ dt1 ║ dt2 ║ dt3 ║
╠══════════╬═════════════════════╬═════════════════════╬═════════════════════╣
║ 1 ║ 2000-10-10 00:00:00 ║ NULL ║ NULL ║
║ 1 ║ NULL ║ 2013-06-30 00:00:00 ║ NULL ║
║ 1 ║ NULL ║ NULL ║ 2013-06-30 00:00:00 ║
╚══════════╩═════════════════════╩═════════════════════╩═════════════════════╝
After the edit to the question
I see the function contains some specific logic. You could still look to use TRY_CONVERT
as part of that, but you should definitely convert the scalar function to an in-line function. In-line functions (RETURNS TABLE
) use a single SELECT
statement and are expanded into the calling query and fully optimized in much the same way views are. It can be helpful to think of in-line functions as parameterized views.
For example, an approximate translation of the scalar function to an in-line version is:
CREATE FUNCTION dbo.CleanDate
(@UnformattedString varchar(12))
RETURNS TABLE
AS RETURN
SELECT Result =
-- Successful conversion or NULL after
-- workarounds applied in CROSS APPLY
-- clauses below
TRY_CONVERT(smalldatetime, ca3.string)
FROM
(
-- Logic starts here
SELECT
CASE
WHEN @UnformattedString IS NULL
THEN NULL
WHEN LEN(@UnformattedString) <= 1
THEN NULL
WHEN LEN(@UnformattedString) = 12
THEN LEFT(@UnformattedString, 8)
ELSE @UnformattedString
END
) AS Input (string)
CROSS APPLY
(
-- Next stage using result so far
SELECT
CASE
WHEN @UnformattedString = '20000000'
THEN '20790606'
ELSE Input.string
END
) AS ca1 (string)
CROSS APPLY
(
-- Next stage using result so far
SELECT CASE
WHEN LEFT(ca1.string, 2) = '00' THEN '20' + RIGHT(ca1.string, 6)
WHEN LEFT(ca1.string, 2) = '18' THEN '19' + RIGHT(ca1.string, 6)
WHEN LEFT(ca1.string, 2) = '19' THEN ca1.string
WHEN LEFT(ca1.string, 2) = '20' THEN ca1.string
WHEN LEN(ca1.string) <> 6 THEN '20' + RIGHT(ca1.string, 6)
ELSE ca1.string
END
) AS ca2 (string)
CROSS APPLY
(
-- Next stage using result so far
SELECT
CASE
WHEN TRY_CONVERT(integer, LEFT(ca2.string, 4)) > YEAR(GETDATE())
THEN '20790606'
WHEN YEAR(GETDATE()) - TRY_CONVERT(integer, LEFT(ca2.string, 4)) >= 100
THEN '20790606'
ELSE ca2.string
END
) AS ca3 (string);
The function used on the sample data:
SELECT
InsertID,
Result1 = CD1.Result,
Result2 = CD2.Result,
Result3 = CD3.Result
FROM dbo.RawData AS RD
CROSS APPLY dbo.CleanDate(RD.MangledDateTime1) AS CD1
CROSS APPLY dbo.CleanDate(RD.MangledDateTime2) AS CD2
CROSS APPLY dbo.CleanDate(RD.MangledDateTime3) AS CD3;
Output:
╔══════════╦═════════════════════╦═════════════════════╦═════════════════════╗
║ InsertID ║ Result1 ║ Result2 ║ Result3 ║
╠══════════╬═════════════════════╬═════════════════════╬═════════════════════╣
║ 1 ║ 2000-10-10 00:00:00 ║ 2079-06-06 00:00:00 ║ NULL ║
║ 1 ║ 2079-06-06 00:00:00 ║ 2013-06-30 00:00:00 ║ 2079-06-06 00:00:00 ║
║ 1 ║ 2000-10-10 00:00:00 ║ 2079-06-06 00:00:00 ║ 2013-06-30 00:00:00 ║
╚══════════╩═════════════════════╩═════════════════════╩═════════════════════╝
*CLR scalar functions have a much faster invocation path than T-SQL scalar functions and do not prevent parallelism.
Best Answer
If a function contains an query that uses a CTE, and that function is used by a parallel worker process, the query and the CTE will only be executed in the private process context of the parallel worker. There is no shared state involved in creating or scanning that CTE.
So it is safe to mark the function
PARALLEL SAFE
.The quotation from the documentation says that a CTE defined in a query that should be parallelized can only be scanned by the leader process, not by the parallel workers, precisely for the reason that the CTE is not shared between processes. This has no impact on CTEs defined in queries running in functions called by parallel workers, because such nested queries won't be parallelized anyway: