Sql-server – Why might a table’s data space take up 4x the size of the raw data

disk-spacesql-server-2008

I have a table with 490 M rows and 55 GB of table space, so about 167 bytes per row. The table has three columns: a VARCHAR(100), a DATETIME2(0), and a SMALLINT. The average length of the text in the VARCHAR field is about 21.5, so the raw data should be around 32 bytes per row: 22+2 for the VARCHAR, 6 for the DATETIME2, and 2 for the 16-bit integer.

Note that the space above is data only, not indices. I'm using the value reported under Properties | Storage | General | Data space.

Of course there must be some overhead, but 135 bytes per row seems like a lot, especially for a large table. Why might this be? Has anyone else seen similar multipliers? What factors can influence the amount of extra space required?

For comparison, I tried creating a table with two INT fields and 1 M rows. The data space required was 16.4 MB: 17 bytes per row, compared to 8 bytes of raw data. Another test table with an INT and a VARCHAR(100) populated with the same text as the real table uses 39 bytes per row (44 K rows), where I would expect 28 plus a little.

So the production table has considerably more overhead. Is this because it's larger? I'd expect index sizes to be roughly N * log(N), but I don't see why the space required for actual data to be non-linear.

Thanks in advance for any pointers!

EDIT:

All of the fields listed are NOT NULL. The real table has a clustered PK on the VARCHAR field and the DATETIME2 field, in that order. For the two tests, the first INT was the (clustered) PK.

If it matters: the table is a record of ping results. The fields are URL, ping date/time, and latency in milliseconds. Data is constantly appended, and never updated, but data is deleted periodically to cut it down to just a few records per hour per URL.

EDIT:

A very interesting answer here suggests that, for an index with much reading and writing, rebuilding may not be beneficial. In my case, the space consumed is a concern, but if write performance is more important, one may be better off with flabby indices.

Best Answer

After discussions in the comments on the original question, it appears in this case the lost space is caused by the choice of clustered key, which has led to massive fragmentation.

Always worth checking the state of fragmentation via sys.dm_db_index_physical_stats in these situations.

Edit: Following update in comments

The average page density (prior to rebuild of the clustered index) was 24%, which fits perfectly with the original question. The pages were only 1/4 full, so the total size was 4x the raw data size.