Sql-server – Is SQL Server’s FILETABLE suitable for storing large files (bigger than 10 GB)

Architecturebackupfiletablesql server

There is a project for lab. The requirement to save the lab data (raw data and related infomation ) for research for years.

So the data must be saved for years, but the data is so big, every raw data is more than 10 GB. If we store the raw data in SQL Server's FILETABLE and store related information in a standard SQL Server table, after months or years, the database size will be so big that we must move some history data out of the database.

Maybe we could use adding files (on a different hard disk) to the FILESTREAM filegroup for filetable (to store raw data ), but I feel that is not suitable for maintaining the SQL Server. We must keep the raw data and related information's consistency.

We have considered using tape to save the historical raw data. We want to use hard disk to save latest raw data and use tape to save the historical raw data. When we move the history data to tape we record the move in a log table, so we can know where the history data is moved and pick up it as soon as possible.

Is there good advice for:

  1. How to store the raw data and related information?
  2. Is filetable suitable for this scenario?
  3. Is there a better solution for this scenario?

Best Answer

FILETABLE data is built upon the FILESTREAM feature of SQL Server, and uses the Windows Server filesystem to store VARBINARY(MAX) data in discrete NTFS files on the filesystem of the server itself. This data is not stored in the database, therefore the database file size will not include these 10GB+ files. Database backups will, by default, include the data. You can prevent SQL Server backups of the filestream data by using partial backups to exclude the filestream filegroups. Filestream data is not cached using SQL Server buffers, the operating systems file cache is used instead.

Microsoft recommends AGAINST using FILESTREAM to store files that are on average less than 1MB since this negates the efficiency gained by using the Win32 file APIs to modify the data involved. The maximum file size for a FILESTREAM file is only limited by the operating system maximum file size, which for current versions of NTFS is for all practical purposes unlimited (16 Exabytes). There are, by design, no limitations placed on the size of a FILESTREAM object by SQL Server.

If the meta data you intend to store in SQL Server must be transactionally consistent with the object data stored in the FILESTREAM you should use FILETABLE/FILESTREAM. If the object data stored in these 10GB+ files is not logically related to meta data stored in SQL Server, I would expect using FILESTREAM might be counterproductive.

In short, it depends on your exact use-case. Add more details to your question about why you want to use a FILETABLE.

FILESTREAM docs at Microsoft: http://technet.microsoft.com/en-us/library/gg471497.aspx

[Edit]

In light of your requirements to store binary X-Ray data along with machine parameters used to capture the x-ray data, it sounds like the precise use-case for a FILESTREAM. However, instead of archiving to TAPE, I'd consider using DataCore SANSymphony-V Software SAN with multiple storage tiers. SANSymphony-V allows you to easily and reliably expand your SAN with inexpensive commodity Windows-based servers using 15k RPM Enterprise drives (with an SSD tier, possibly) for the main database along with a tier configured with standard 7200 RPM SATA drives for storage of the x-ray data. SANSymphony-V is SQL Server certified and would provide a relatively inexpensive way of maintaining access to both the new data and the older data without any complicated data-management system. If Disaster Recovery and High Availability is a concern, the entire SANSymphony-V SAN can be mirrored, either on-site, or off-site at distances of up to around 100km.

Just for clarity, I am not in any way associated with DataCore - I simply love their design.