Sql-server – Strategy to handle an SQL Server DB with too many files (BLOBs) in it

disk-spaceperformancesql-server-2005

Scenario:
SQL Server 2005 database servicing an ASP.NET application (on a separate web servers).

Database:
DB has about 5GB of "normal" data in it, and about 15GB of "files" (e.g.: a 200k PDF stored as an image (BLOB), that sort of thing). More files are being uploaded by users, and are rapidly consuming more disk space (DB could grow to 50GB in the next few months, mostly files).

Concerns:
Storing so many files in the DB is already causing problems (e.g.: The large total size of the database makes occasional whole-DB backups and deployments difficult.).

And we are worried there will be more problems. (e.g.: performance issues – maybe caused by not being able to keep the whole DB in RAM, perhaps?)

Question:
What technical solution would you suggest to this problem? Store the files in the file system? Split the database in two and have a bigger, slower one for files?

Further Details if Needed:
These files are not super-important, and don't need very fast access times – a couple of seconds would be fine, and there are maybe a dozen selects per hour at most, currently. The other "normal" data in the DB includes info needed many times per second.

Best Answer

I look after a very similar database, currently 3TB and growing 5GB per day.

  • Filestream (2008+) doesn't solve the backup/restore challenge.
  • Filestream performs better than LOB storage for files >1MB, so says Paul Randal's testing. It's workload dependent at 256KB-1MB and generally worse at <256KB.
  • A big plus for Filestream in some environments is that it bypasses the buffer pool and uses the Windows system cache instead.
  • If you put the files on a filesystem, you lose transactional consistency with the database record. You've also added the overhead of backing up millions of individual files, which can be troublesome.

Weigh up the pro's and con's for Filestream and see if it fits in your case. In our case, we've taken a different route and opted for partitioning the database so we can make use of partial availability/piecemeal restore.

One option that wasn't available to us, which you may have, is to mark older/archive filegroups as read-only. The read-only filegroup(s) can then be backed up infrequently.

If your stuck on 2005 Standard (partitioning is an Enterprise edition feature) and you have the option of read-only for history, you could tackle this the old fashioned way.

  • Split your table. You could consider the active/history route or date based e.g. table per month.
  • Put the historic data on a read-only filegroup and back it up only when you archive further data. Ensure your users understand that this cuts backup time only. Restore might take a while when you haven't got the partial availability feature.
  • Create a partitioned view over the tables.

One final option (which we're considering for our 3TB blobber) is to move the file data to a document database or cloud storage (e.g. AmazonS3, Azure BLOB Storage). This does introduce the transactional consistency problem I mentioned earlier but it takes the load away from those very expensive SQL Servers.