SQL Server 2016 – How Does the Microsoft tablediff Utility Work?

sql serversql-server-2016

Our team is curious about the architecture of tablediff utility. We want to find the difference between table rows.

Microsoft mentions tablediff here, however does not state how it works. Does tablediff take the checksum or hash of every row and compare tables? What is the internal algorithm method?

https://docs.microsoft.com/en-us/sql/tools/tablediff-utility?view=sql-server-2017

create table dbo.CustomerTransaction
(
    CustomerTransactionId int primary key,
    CustomerName varchar(50),
    ProductName varchar(50),
    QuantityBought int
)

Using tablediff example:

So row values Table 1: (1,'Bob','Table',8) is the Same as Table 2: (1,'Bob','Table',8)

These are different (1,'Bob','Table',8), (1,'Bob','Chair',8), different on the primary key.

I know it requires the source table to have primary Key/Identity/rowguid column to compare, and was originally based on replication technology.

Best Answer

I'm not aware of authoritative documentation of the fact that the current version of the tablediff utility uses a checksum to identify data differences. However, I was able to verify this by capturing the queries executed by the utility using a sql_batch_completed event trace. The trace showed a query is run against each table, returning only the primary key value(s) along with the T-SQL BINARY_CHECKSUM over all table columns.

This is the T-SQL script I ran on my local SQL instance to the create tables and trace:

USE tempdb;
DROP TABLE IF EXISTS dbo.table1;
DROP TABLE IF EXISTS dbo.table2;
CREATE TABLE dbo.Table1(Col1 int NOT NULL CONSTRAINT PK_Table1 PRIMARY KEY, Col2 int);
CREATE TABLE dbo.Table2(Col1 int NOT NULL CONSTRAINT PK_Table2 PRIMARY KEY, Col2 int);
INSERT INTO dbo.Table1 VALUES
      (1, 1)
    , (2, 2)
    , (3, 3);
INSERT INTO dbo.Table2 VALUES
      (1, 1)
    , (2, 0)
    , (4, 4);

While viewing the trace in SSMS (Object Explorer: right-click Management-->Extended Events-->Sessions-->sql_batch_completed and select Watch Live Data), I ran this from the command prompt:

"C:\Program Files\Microsoft SQL Server\140\COM\tablediff" -sourceserver "." -sourcedatabase "tempdb" -sourceschema "dbo" -sourcetable "Table1" -destinationserver "." -destinationschema "dbo" -destinationdatabase "tempdb" -destinationtable "Table2"

These were the relevant queries executed by tablediff:

SELECT [dbo].[table1].[col1],BINARY_CHECKSUM([dbo].[table1].[col1],[dbo].[table1].[col2],[dbo].[table1].[col3]) as MShash_54267293 FROM [dbo].[table1] WITH (READUNCOMMITTED) ORDER BY [dbo].[table1].[col1]

SELECT [dbo].[table2].[col1],BINARY_CHECKSUM([dbo].[table2].[col1],[dbo].[table2].[col2],[dbo].[table2].[col3]) as MShash_54267293 FROM [dbo].[table2] WITH (READUNCOMMITTED) ORDER BY [dbo].[table2].[col1]