You can also do this with dynamic SQL without having to manually build out all the column names.
DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);
SELECT @c1 = N'', @c2 = N'';
SELECT
@c1 = @c1 + ',' + QUOTENAME(name),
@c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
FROM sys.columns
WHERE name <> 'LocationID'
AND [object_id] = OBJECT_ID('dbo.table1');
SET @sql = ';WITH s AS (
SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
EXCEPT
SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
)
SELECT m.LocationID
FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
' + @c2;
SELECT @sql;
--EXEC sp_executesql @sql;
You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT
and uncomment the EXEC
and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.
Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:
;WITH c AS
(
SELECT t.LocationID, m.setting1, m.setting2, ...
FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
)
SELECT DISTINCT src = '> master', setting1, setting2, ...
FROM c
UNION ALL
(
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM dbo.table1
EXCEPT
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM c
)
ORDER BY src;
This version is a little cheaper (mostly by avoiding the DISTINCT
against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):
;WITH m AS
(
SELECT setting1, setting2, ...
FROM dbo.table1_master
),
c AS
(
SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
FROM dbo.table1 AS t CROSS JOIN m
)
SELECT src = '> master', setting1, setting2, ...
FROM m
UNION ALL
(
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM dbo.table1
EXCEPT
SELECT src, setting1, setting2, ...
FROM c
)
ORDER BY src;
However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN
. I tried to stick to the theme of using EXCEPT
even though it is more about syntax than performance.
The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:
DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';
SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.table1_master');
SELECT TOP (1) @anycol = QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.table1_master')
ORDER BY name;
SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1'
+ @and + ' WHERE m.' + @anycol + ' IS NULL;';
SELECT @sql;
--EXEC sp_executesql @sql;
Here is a rewrite of what you already have.
- No shredding necessary on root node.
- Specify the
text()
node for the parameter name and parameter value.
In my limited tests that made it significantly faster. I look forward to see what if any performance gain you see on your side.
select T.TestId,
T.TestData.value('(/root/@name)[1]', 'varchar(max)') as [root],
C.X.value('@name', 'varchar(max)') as child,
GC.X.value('@name', 'varchar(max)') as grandchild,
null as greatgrandchild,
P.X.value('(name/text())[1]', 'varchar(max)') as parameter
from dbo.TestTable as T
cross apply T.TestData.nodes('/root/child') as C(X)
cross apply C.X.nodes('grandchild') as GC(X)
cross apply GC.X.nodes('parameter') as P(X)
where P.X.value('(value/text())[1]', 'varchar(max)') like '%[^Aa]BC%'
union all
select T.TestId,
T.TestData.value('(/root/@name)[1]', 'varchar(max)') as [root],
C.X.value('@name', 'varchar(max)') as child,
GC.X.value('@name', 'varchar(max)') as grandchild,
GGC.X.value('@name', 'varchar(max)') as greatgrandchild,
P.X.value('(name/text())[1]', 'varchar(max)') as parameter
from dbo.TestTable as T
cross apply T.TestData.nodes('/root/child') as C(X)
cross apply C.X.nodes('grandchild') as GC(X)
cross apply GC.X.nodes('greatgrandchild') as GGC(X)
cross apply GGC.X.nodes('parameter') as P(X)
where P.X.value('(value/text())[1]', 'varchar(max)') like '%[^Aa]BC%'
Update:
I took the liberty to execute the test rig provided by wBob on SQL Server 2014 with Compatibility level 110 (SQL Server 2012) and 120 (SQL Server 2014)
Result:
Compatibility level OP's query My query wBob query using XML indexes
------------------- ---------- -------- ----------------------------
110 64 sec 37 sec 1 sec
120 8 sec 4 sec 5 sec
The difference you see in execution time is there because SQL Server 2014 uses a new cardinality estimator. When compatibility level is 110 SQL Server will use the old estimator. Turning the new cardinality estimator on or off using trace flags yields the exact same result.
The difference in times for the queries that does not use the XML indexes is because in Compatibility level 120 the plans are parallel with DOP 16 in my case.
It is also really interesting to notice that the execution time for using XML indexes is five times slower when using the new cardinality estimator. The reason for that is the same as above only the other way around. There is a parallel plan only when using the old cardinality estimator.
Best Answer
Count the number of rows for every distribution and compare that to the number of rows returned for a distribution where you apply your filter.