There is a project for lab. The requirement to save the lab data (raw data and related infomation ) for research for years.
So the data must be saved for years, but the data is so big, every raw data is more than 10 GB. If we store the raw data in SQL Server's FILETABLE
and store related information in a standard SQL Server table, after months or years, the database size will be so big that we must move some history data out of the database.
Maybe we could use adding files (on a different hard disk) to the FILESTREAM
filegroup for filetable (to store raw data ), but I feel that is not suitable for maintaining the SQL Server. We must keep the raw data and related information's consistency.
We have considered using tape to save the historical raw data. We want to use hard disk to save latest raw data and use tape to save the historical raw data. When we move the history data to tape we record the move in a log table, so we can know where the history data is moved and pick up it as soon as possible.
Is there good advice for:
- How to store the raw data and related information?
- Is
filetable
suitable for this scenario? - Is there a better solution for this scenario?
Best Answer
FILETABLE
data is built upon theFILESTREAM
feature of SQL Server, and uses the Windows Server filesystem to storeVARBINARY(MAX)
data in discrete NTFS files on the filesystem of the server itself. This data is not stored in the database, therefore the database file size will not include these 10GB+ files. Database backups will, by default, include the data. You can prevent SQL Server backups of the filestream data by using partial backups to exclude the filestream filegroups. Filestream data is not cached using SQL Server buffers, the operating systems file cache is used instead.Microsoft recommends AGAINST using
FILESTREAM
to store files that are on average less than 1MB since this negates the efficiency gained by using the Win32 file APIs to modify the data involved. The maximum file size for aFILESTREAM
file is only limited by the operating system maximum file size, which for current versions of NTFS is for all practical purposes unlimited (16 Exabytes). There are, by design, no limitations placed on the size of aFILESTREAM
object by SQL Server.If the meta data you intend to store in SQL Server must be transactionally consistent with the object data stored in the
FILESTREAM
you should useFILETABLE/FILESTREAM
. If the object data stored in these 10GB+ files is not logically related to meta data stored in SQL Server, I would expect usingFILESTREAM
might be counterproductive.In short, it depends on your exact use-case. Add more details to your question about why you want to use a
FILETABLE
.FILESTREAM docs at Microsoft: http://technet.microsoft.com/en-us/library/gg471497.aspx
[Edit]
In light of your requirements to store binary X-Ray data along with machine parameters used to capture the x-ray data, it sounds like the precise use-case for a
FILESTREAM
. However, instead of archiving to TAPE, I'd consider using DataCore SANSymphony-V Software SAN with multiple storage tiers. SANSymphony-V allows you to easily and reliably expand your SAN with inexpensive commodity Windows-based servers using 15k RPM Enterprise drives (with an SSD tier, possibly) for the main database along with a tier configured with standard 7200 RPM SATA drives for storage of the x-ray data. SANSymphony-V is SQL Server certified and would provide a relatively inexpensive way of maintaining access to both the new data and the older data without any complicated data-management system. If Disaster Recovery and High Availability is a concern, the entire SANSymphony-V SAN can be mirrored, either on-site, or off-site at distances of up to around 100km.Just for clarity, I am not in any way associated with DataCore - I simply love their design.