Sql-server – Difficult Query: Can Only Modify The WHERE Clause

sql server

Table tblDistResell (the first row is the column headers):

DISTRIBUTOR RESELLER    STATE
ABC         BestBuy     CA
ABC         RadioShack  NJ
XYZ         BestBuy     CA
XYZ         RadioShack  MO

Query:

SELECT Distributor
WHERE
(
(Reseller = 'BestBuy' AND [State]='CA') OR (Reseller = 'RadioShack' AND [State]='NJ')
);

Resultset:

DISTRIBUTOR  RESELLER    STATE
ABC          BestBuy     CA
ABC          RadioShack  NJ
XYZ          BestBuy     CA

When EVERY row for a distributor is returned, we want the count (or the resultset; whatever's easier).

But if only ONE (or more) distributor record(s) aren't returned, we want to ignore all of that distributor's results.

In other words, given the example above, we want to see either,

DISTRIBUTOR  RESELLER    STATE
ABC          BestBuy     CA
ABC          RadioShack  NJ

count == 2

(No XYZ records, since the RadioShack, MO record is missing from the resultset.)

Here's one more example for clarity…

Let's use the same table, tbltblDistResell, but a modified query:

SELECT Distributor                                              
WHERE                                               
(                                               
(Reseller = 'BestBuy' AND [State]='MO') OR (Reseller = 'RadioShack' AND [State]='NJ')
);

In this case, the following is returned:

DISTRIBUTOR  RESELLER    STATE
ABC          RadioShack  NJ

But since there is some ABC data that wasn't returned, we want to see either:

(0 row(s) affected)

count == 0

The challenge is that the tool we're using only allows us to modify the WHERE clause!

Any help would be greatly appreciated.

Greg

Best Answer

Count the number of rows for every distribution and compare that to the number of rows returned for a distribution where you apply your filter.

declare @T table
(
  Distributor char(3),
  Reseller varchar(10),
  State char(2)
);

insert into @T values
('ABC', 'BestBuy',    'CA'),
('ABC', 'RadioShack', 'NJ'),
('XYZ', 'BestBuy',    'CA'),
('XYZ', 'RadioShack', 'MO');

select T1.Distributor
from @T as T1
where (
      select count(*)
      from @T as T2
      where T1.Distributor =T2.Distributor
      ) = 
      (
      select count(*)
      from @T as T3
      where T1.Distributor =T3.Distributor and  
            (
            T3.Reseller = 'BestBuy' and T3.State='CA' or 
            T3.Reseller = 'RadioShack' and T3.State='NJ'
            )
      );

select T1.Distributor
from @T as T1
where (
      select count(*)
      from @T as T2
      where T1.Distributor =T2.Distributor
      ) = 
      (
      select count(*)
      from @T as T3
      where T1.Distributor =T3.Distributor and  
            (
            T3.Reseller = 'BestBuy' and T3.State='MO' or 
            T3.Reseller = 'RadioShack' and T3.State='NJ'
            )
      );

Related Solutions

Sql-server – Identifying rows which don’t match a master row

You can also do this with dynamic SQL without having to manually build out all the column names.

DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);

SELECT @c1 = N'', @c2 = N'';

SELECT 
  @c1 = @c1 + ',' + QUOTENAME(name),
  @c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
 FROM sys.columns
 WHERE name <> 'LocationID'
 AND [object_id] = OBJECT_ID('dbo.table1');

SET @sql = ';WITH s AS (
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
       EXCEPT 
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
     ) 
     SELECT m.LocationID
 FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
 ' + @c2;

SELECT @sql;
--EXEC sp_executesql @sql;

You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT and uncomment the EXEC and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.

Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:

  ;WITH c AS 
  (
    SELECT t.LocationID, m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
  )
  SELECT DISTINCT src = '> master', setting1, setting2, ...
    FROM c
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

This version is a little cheaper (mostly by avoiding the DISTINCT against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):

  ;WITH m AS 
  (
    SELECT setting1, setting2, ... 
      FROM dbo.table1_master
  ),
  c AS 
  (
    SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN m
  )
  SELECT src = '> master', setting1, setting2, ...
    FROM m
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT src, setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN. I tried to stick to the theme of using EXCEPT even though it is more about syntax than performance.

The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:

DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';

SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master');

SELECT TOP (1) @anycol = QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master')
  ORDER BY name;

SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1' 
  + @and + ' WHERE m.' + @anycol + ' IS NULL;';

SELECT @sql;
--EXEC sp_executesql @sql;

SQL Server – Optimizing SELECT Queries with XQuery Functions

Here is a rewrite of what you already have.

No shredding necessary on root node.
Specify the text() node for the parameter name and parameter value.

In my limited tests that made it significantly faster. I look forward to see what if any performance gain you see on your side.

select T.TestId,
       T.TestData.value('(/root/@name)[1]', 'varchar(max)') as [root],
       C.X.value('@name', 'varchar(max)') as child,
       GC.X.value('@name', 'varchar(max)') as grandchild,
       null as greatgrandchild,
       P.X.value('(name/text())[1]', 'varchar(max)') as parameter
from dbo.TestTable as T
  cross apply T.TestData.nodes('/root/child') as C(X)
  cross apply C.X.nodes('grandchild') as GC(X)
  cross apply GC.X.nodes('parameter') as P(X)
where P.X.value('(value/text())[1]', 'varchar(max)') like '%[^Aa]BC%'
union all
select T.TestId,
       T.TestData.value('(/root/@name)[1]', 'varchar(max)') as [root],
       C.X.value('@name', 'varchar(max)') as child,
       GC.X.value('@name', 'varchar(max)') as grandchild,
       GGC.X.value('@name', 'varchar(max)') as greatgrandchild,
       P.X.value('(name/text())[1]', 'varchar(max)') as parameter
from dbo.TestTable as T
  cross apply T.TestData.nodes('/root/child') as C(X)
  cross apply C.X.nodes('grandchild') as GC(X)
  cross apply GC.X.nodes('greatgrandchild') as GGC(X)
  cross apply GGC.X.nodes('parameter') as P(X)
where P.X.value('(value/text())[1]', 'varchar(max)') like '%[^Aa]BC%'

Update:

I took the liberty to execute the test rig provided by wBob on SQL Server 2014 with Compatibility level 110 (SQL Server 2012) and 120 (SQL Server 2014)

Result:

Compatibility level  OP's query  My query  wBob query using XML indexes
-------------------  ----------  --------  ----------------------------
110                  64 sec      37 sec    1 sec
120                  8 sec       4 sec     5 sec

The difference you see in execution time is there because SQL Server 2014 uses a new cardinality estimator. When compatibility level is 110 SQL Server will use the old estimator. Turning the new cardinality estimator on or off using trace flags yields the exact same result.

The difference in times for the queries that does not use the XML indexes is because in Compatibility level 120 the plans are parallel with DOP 16 in my case.

It is also really interesting to notice that the execution time for using XML indexes is five times slower when using the new cardinality estimator. The reason for that is the same as above only the other way around. There is a parallel plan only when using the old cardinality estimator.

Best Answer

Related Solutions

Sql-server – Identifying rows which don’t match a master row

SQL Server – Optimizing SELECT Queries with XQuery Functions

Related Question