Sql-server – Identifying rows which don’t match a master row

exceptquerysql serversql-server-2005

I'm comparing a bunch of tables from different databases on different servers to a Master record. I need to know which servers, identified by locationID, have the non-matching rows because they might need maintenance.

I've got a simple EXCEPT query where I compare a table where each row is the configuration from each server; table1 has one row per server with all configuration plus locationID which is a column that tells me which server it is. I compare these all to a table1_master table which has the right settings, but I exclude the locationID since it won't match.

Simple query below:

SELECT everything, but, locationID
FROM table1
EXCEPT
SELECT everything, but, locationID
FROM table1_master

There's only one master row I compare all servers to, and I don't select it's locationID here.

This is an example of the rows I'm comparing. Each has a primary key, a single column varchar and a giant list of that's dozens of columns. I want to compare all columns except LocationID, but I need LocationID to identify the rows.

LocationID             setting    setting    setting     setting
CS02      C            Y           Y         Y           Y
CS03      C            Y           Y         Y           Y
CS06      C            Y           N         Y           Y

In this example say CS02 is my Master record, so since all settings are the same in CS02 and CS03, those rows don't show up, but CS06's does. But in my EXCEPT query, I'm not actually catching LocationID so I don't actually know which row was returned.

This returns the rows I need but NOT the locationID, so I don't know which rows are wrong. Is there any way I can include locationID in the results set while kicking out the matching rows?

The solution I thought of was to make a row for each server in the table1_master table, so each locationID is represented, but they all have the same data other than that. My EXCLUDE query should then return the locationID and my info, but is that the best way to do it?

Best Answer

You can also do this with dynamic SQL without having to manually build out all the column names.

DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);

SELECT @c1 = N'', @c2 = N'';

SELECT 
  @c1 = @c1 + ',' + QUOTENAME(name),
  @c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
 FROM sys.columns
 WHERE name <> 'LocationID'
 AND [object_id] = OBJECT_ID('dbo.table1');

SET @sql = ';WITH s AS (
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
       EXCEPT 
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
     ) 
     SELECT m.LocationID
 FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
 ' + @c2;

SELECT @sql;
--EXEC sp_executesql @sql;

You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT and uncomment the EXEC and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.

Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:

  ;WITH c AS 
  (
    SELECT t.LocationID, m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
  )
  SELECT DISTINCT src = '> master', setting1, setting2, ...
    FROM c
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

This version is a little cheaper (mostly by avoiding the DISTINCT against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):

  ;WITH m AS 
  (
    SELECT setting1, setting2, ... 
      FROM dbo.table1_master
  ),
  c AS 
  (
    SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN m
  )
  SELECT src = '> master', setting1, setting2, ...
    FROM m
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT src, setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN. I tried to stick to the theme of using EXCEPT even though it is more about syntax than performance.

The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:

DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';

SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master');

SELECT TOP (1) @anycol = QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master')
  ORDER BY name;

SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1' 
  + @and + ' WHERE m.' + @anycol + ' IS NULL;';

SELECT @sql;
--EXEC sp_executesql @sql;

Related Solutions

Sql-server – How to calculate values based on the previous row after skipping the first 12 rows

So this isn't a great answer, this is kind of a starting answer for somebody else to take on and refine this better. But I'll make a stab at it.

First I have a question: Are you trying to retain this in a view? I don't think you can for what you're wanting to do, it's kinda complicated, so let's examine the operations that you need to do to actually do what you want.

You stated that you want the first 12 rows to be static every time, and they should always have their last column set as NULL, and the others should retain their value. So that's a business rule that we need to encode in SQL. But before we encode this as a rule, let's ask if there's a way to ENSURE that those 12 rows are the RIGHT rows every time. If we can make that assumption, then we can do this as part of the next step.

You're next requirement is to do a calculation on each row with the previous row. Since the first 12 rows are static (and I presume not calculated) then we don't have to ask "what about the first row". So the easiest way to do calculations on the previous row is to assign a rownum to each row, then use the rownum ID in a comparison. This meshes with the previous requirement.

So we should start by doing our select and assigning a rownum as well, like this:

SELECT     
    ROW_NUMBER() OVER (ORDER BY in.I_Date) AS rownum,
    in.I_Date  ,--Date
    in.I_O_P   ,--Money
    in.I_O_H   ,--Money
    in.I_O_L   ,--Money
    in.I_C_O   ,--Money 
    c.AMPS12_C  --Money
    CAST(0.0 AS Money) AS C12WR
FROM
    dbo.IC_Raw_In in
INNER JOIN 
    dbo.AMPS12_C c ON in.I_Serial = c.i_serial

But for the way I would do this, I would funnel these values into a temp table, and then use that to work out what I need. That way you can just refer to the columns in subsequent calls, like this:

UPDATE t 
SET C12WR = NULL
FROM temptable t
WHERE t.rownum < 12 -- see how we set the values = null here?

UPDATE t 
SET C12WR = 510.3958
FROM temptable t
WHERE t.rownum = 12 -- see how we set the value to something static? 
                    -- If this were a stored procedure we could use a value passed in here

and then we continue with:

UPDATE t 
SET C12WR = ( ( t2.C12WR * 11.0 ) + t.I_C_O ) / 12.0
FROM temptable t
INNER JOIN temptable t2 ON t.rownum = (t2.rownum - 1) -- this let's us get the previous row
WHERE t.rownum > 12

Using this logic: After the 13th row, the C12WR column = (prevrow.C12WR * 11 + currow.I_C_O Column) / 12

And then you would just return the values that you wanted from the temptable.

Notice: the things I left off. I did not define the temp table, I did not get rid of the temptable. I did not use appropriate syntax for the temptable addressing. I did not validate anything. I presumed that this was going to be used in a stored procedure. I did not illustrate how to use the static value as a stored procedure passed parameter.

Hope this helps. Hope someone else helps make this a better answer ;)

Sql-server – Which is quicker: Select of existing row vs Update where no row exists

Can you not use the MERGE statement added with SQL Server 2008 to "UPSERT" in one atomic operation?

DECLARE @FilterValue INT

;MERGE 
INTO ChildFilterTable AS CFT
USING  (your filter, source thing here)
                  ON (CFT.ChildID = ...)
WHEN MATCHED
    THEN update stuff
WHEN NOT MATCHED BY TARGET
    THEN insert stuff;

Best Answer

Related Solutions

Sql-server – How to calculate values based on the previous row after skipping the first 12 rows

Sql-server – Which is quicker: Select of existing row vs Update where no row exists

Related Question