Sql-server – cross-platform way to compare data in one columnd on each sitde of replicated data, like a checksum or hash

checksumhashingoraclereplicationsql server

I have an Oracle 12 database with lots of tables, and I am replicating several of the tables (a subset of rows) into a SQL Server 2016 database. The subset of rows can be established with a WHERE clause on the Oracle side.

I have two web services that can expose anything I want to from that data, one on each side.

Do you have a suggestion for an approach of what I can expose, then compare to find out if the data between the systems matches?

I am currently exposing from one table, which has a few million rows the COUNT(*), which is a no-brainer since it is very efficient. So far, so good.

I'm also exposing the SUM of each of a few NUMBER(18,2) columns and comparing it to the corresponding SUM on the SQL Server side. However, this is problematic, as it has to scan the whole table in SQL Server; it is sometimes blocked, and sometimes might cause other processes to block. I imagine similar problems could occur on the Oracle side too.

Also, the SUM will not tell me if the rows match–it will only tell me that the totals match; if an amount was improperly added to one row and subtracted from another, I wouldn't catch it.

I've pondered whether Oracle's STANDARD_HASH might help me, but it seems cumbersome or error-prone to try to generate the exact same HASH on the SQL Server side, and also this doesn't help with the inefficiency/blocking nature of the call.

So is there any way to have both databases keep track of a hash, checksum, CRC code, or other summary of a column's data, that is efficient to retrieve, and that I can then use to compare to have some idea whether data is the same on both sides? It need not be a perfect solution–for example, comparing SUMs is close but perhaps not quite good enough.

As a first stab, I created an "summary" indexed view, with columns derived from SUMs, on the SQL Server side. This makes querying the view very fast, but incurs additional penalty on every write to the large table underneath. Still, I think it will work, but I'd like to improve on it. Other, better ideas?

Best Answer

The Oracle function for hashing is called

Standard_hash() which supports these methods :- SHA1, SHA256, SHA384, SHA512, and MD5

https://docs.oracle.com/database/121/SQLRF/functions183.htm#SQLRF55647

In SQL Server you can use Hashbytes() which supports these methods :- MD2, MD4, MD5, SHA, SHA1, or SHA2_256, SHA2_512

https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15

As they both support SHA2_256 or SHA2_512 I would start by setting up a few rows of test data and check that they produce the same result.

a search also throws up the following link http://roelantvos.com/blog/hashing-in-sql-server-and-oracle-for-the-same-output/

EDIT : I reread the comments & saw that you were concerned that standards for conversions of decimals to strings etc were not standard - I think you could fairly easily address this by using the schema & type views to dynamically construct conversion queries using CASE statements etc, & at least it only has to be done on one side.