Sql-server – Are duplicated NVARCHAR values stored as copies in SQL Server

azure-sql-databasedatabase-designslowly-changing-dimensionsql server

I'm designing a table that will contain a lot of rows. So need to be careful not to store to much information. One of the columns is a NVARCHAR(MAX) column and it contains the address of our customers. As addresses do not change often, this column will contain many repeated values and thus contains quite some redundancy.

So I was wondering if I need to normalize this myself by maintaining some sort of look-up table to address strings (note that if an address changes I need to maintain history – so it's not a matter of usual normalization), or if SQL Server is pointing to the same reference of the string behind the scenes. Or maybe it offers a column option to do so. Another approach that came into my mind is to use COMPRESS but I guess this does not make sense as the data itself (i.e. the address) is not long.

Reading/writing performance is not so much of a concern as the data will be accumulated over time.

Best Answer

Yes duplicated data is stored as copies in SQL Server

To change this behavior, you would need to implement PAGE COMPRESSION feature - create or rebuild indexes with (data_compression = on) option

It is a great feature that helps to save space. Once you enable it, SQL Server is pointing to the same reference of the string behind the scenes

Beware that PAGE COMPRESSION is not available in every SQL Server Edition, and it can have some CPU overhead

So you might want to make a lookup table if your edition does not allow for page compression