Sql-server – Why does imported txt table consume so much space

sql serversql-server-2016storage

I have a .txt file with storage size 1253 MB. After importing it to SQL Server 2016, the table takes 1680 MB on disk. Why does the storage space grow so much?

I didn't do any data transformation and use pure char format to store data. I read the data dictionary of the txt file and if the length of a variable is at most n, then I use varchar(3). If the file is fixed length n, then I used char(n).

About 4 to 5 columns are densely populated and all the other columns are very sparse. For the sparse columns, only 1% rows are not null.

Here is the code creating the table and importing data:

CREATE TABLE table1 (
var1 char(12),      var2 char(6),      var3 varchar(12),       var4 varchar(3),
var5 varchar(3),    var6 varchar(4),   var7 char(1),           var8 char(1),
var9 char(2),       var10 char(6),     var11     varchar(8),   var12 varchar(12),
var13 char(6),      var14 varchar(12), var15 varchar(14),var16 varchar(12),
var17 varchar(12),  var18 varchar(12), var19 varchar(12),var20 varchar(12),
var21 varchar(12),  var22 varchar(12));

How can I reduce the storage space?

Here is the storage information from command sp_spaced:

name       table1
rows       22260960
reserved   1721240 KB
data       1721128 KB
index_size 24 KB
unused     88 KB

Best Answer

There is a metadata storage overhead for columns in SQL Server that you do not have to pay for in a flat file. Using compression can reduce this overhead. From Row Compression Implementation:

It reduces the metadata overhead that is associated with the record. This metadata is information about columns, their lengths and offsets. In some cases, the metadata overhead might be larger than the old storage format.

Try compressing your table with row or page compression and see if that reduces the space required in SQL Server.

ALTER TABLE table1 REBUILD WITH (DATA_COMPRESSION = ROW);

In my experience, compression can both improve and degrade query performance. It depends on the data and the query that you're running. It's possible that you'll need to make a trade off between space and query performance.