Sql-server – Should I use the deprecated MD5 function in SQL Server

hashingsql serversql-server-2017

We'd like to use MD5 for our hashing function instead of SHA_256 but as of SQL Server 2016, MD5 is deprecated. We're using this for hashing (comparing which records have changed). We now have this dilemma of using of risking it by using this function or incurring storage and performance overhead of using SHA_256. It's frustrating Microsoft decided to deprecate these functions even though they are still useful in certain scenarios.

This project isn't a critical component of the business. We'll likely go with SHA_256 but is this the right choice? Should new development always avoid deprecated functions?

For context – daily will be about 1-2 million upserting into a 400 million row table comparing hashbytes on the fly. about 30 columns wide

https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-2017

Choosing the right algorithm in HashBytes function

Best Answer

From Microsoft's Documentation:

When a feature is marked deprecated, it means:

  • The feature is in maintenance mode only. No new changes will be done, including those related to inter-operability with new features.
  • We strive not to remove a deprecated feature from future releases to make upgrades easier. However, under rare situations, we may choose to permanently remove the feature from SQL Server if it limits future innovations.
  • For new development work, we do not recommend using deprecated features.

Another part of that page says:

The following SQL Server Database Engine features are supported in the next version of SQL Server, but will be deprecated in a later version. The specific version of SQL Server has not been determined.

Depending on how you're using MD5 and HASHBYTES, you may have other options available aside from using SHA_256.

You may be interested in using this methodology I documented to determine which rows have changes. It uses a rowversion column in the source table to allow you to instantly see rows that are new or have changes. This avoids using a deprecated feature, and is much faster than any hashing function will ever be.