I'm maintaining a SQL Server 2005 database which hosts approximately 2.9Tb of data (2 x 1.45Tb – I have a RAW schema and an ANALYSIS schema so basically two copies of the data ingested). The recovery model is SIMPLE and the .ldf
is at 6Gb.
For whatever reason, the .mdf
is 7.5Tb. Now, there are only maybe 2-3 additional columns in ANALYSIS tables and not many NVARCHAR(MAX)
columns which, from what I (may have mistakenly understood – please correct me if I'm wrong) may be causing additional space allocation. That's after shrinking the database just now – it was at ~9Tb prior to that. Any thoughts?
And, please, let me know if you have additional questions – I'm very new to database administration and optimization efforts (I usually don't do this side of the job :)).
Many thanks!
Andrija
Best Answer
In your size estimates, have you taken into account the amount of space taken by indexes? Also if you have text fields that are set as multi-byte (
N[VAR]CHAR
rather than[VAR]CHAR
) and the input files are UTF-8 or plain one-byte-per-character then that will push your storage requirements up by up to a factor of two. Furthermore remember that if you have a clustered key/index on a table the size of this affects all other indexes on the table because they include the clustered key value for every row (so to give an extreme example if a table has an NCHAR(10) key where an INT would do and that is your clustered key/index you are not only using an extra 16 bytes per row in the data pages you also waste 16 bytes per row in every other index on that table).Also, some space will be allocated but unused, either because the DB engine has left some space allocated after deletes so that it can be used again quickly for new data in that table or because the pattern of inserts and deletes has left many pages only part full.
You can run:
to get a quick look at what tables are taking up space.
Also
EXEC sp_spaceused
run within that DB will return two result sets. The first lists the total space allocated in the filesystem for the data files and how much of that is unallocated, the second lists how much of the allocated space is used for data pages, for index pages, or is currently unused.sp_spaceused
will return the space used by a given object too, so you can loop this to build a table for analysis:The above code will output all the table sizes in one list, plus a single row for the totals. If needed you can use the various system views (like
sys.objects
andsys.dm_db_partition_stats
used in the first query above, see http://technet.microsoft.com/en-us/library/ms177862.aspx for much more detail) to get more details such as the space used by each index.There are three classes of unused space in a data file:
sp_spaceused
with no object specified)sp_spaceused
's output.Another caveat here is large objects (
TEXT
columns,[N]VARCHAR(MAX)
values above a certain size and so on) as they do get placed off-page, just taking 8 bytes in the main row data to hold a pointer to the data elsewhere) so can break the 8,192 bytes-per-row-limit.tl;dr: Estimating expected database sizes can be a lot more involved than it is natural to initially assume.