Sql-server – Please enlighten me about FILESTREAM and BLOB files

blobfilestreamsql serversql-server-2008-r2varbinary

This is a really confusing topic for me.

I can understand what a BLOB is, I can easily use it.
I can understand what FILESTREAM is. I can easily implement in on a database, I can backup the database and restore it with the folder created to store these files. Things like these, I think, I can understand well.

What I don't get is:

  1. What's the difference between inserting the .PDF (for example) using just a VARBINARY(MAX) column in the database, and storing this .PDF in a FILESTREAM database. I know with the BLOB varbinary(max) the .PDF is INSIDE the database. If I physically delete the .PDF, can I still restore it using the value stored inside the varbinary(MAX) column right?

  2. When I insert this .PDF file in a filestream, I can see it inside the folder (the one created for the filegroup), if I insert 100 files, I can see 100 files inside the filestream folder, but when I run DELETE FROM and delete 100 rows in the database, I can still see these files in the filestream folder. Why does this happen? I thought that, to maintain consistency, these files would be deleted too.

  3. With both cases, after inserting the .PDF file inside the BLOB VARBINARY(MAX) or FILESTREAM, do I still need the original file? As it is inside the database, I can easily recover it.

I'm using .PDF as an example

I can use and implement both approaches but it still not clear to me what happens with them. For me, it's the same thing, but one is stored outside the database, and the other one inside the database.

Best Answer

Based on the whitepaper written by Paul Randal -

  • BLOBs smaller than 256 kilobytes (KB) (such as widget icons) are better stored inside a database.
  • BLOBs larger than 1 megabyte (MB) are best stored outside the database.
  • For those sized between 256 KB and 1 MB, the more efficient storage solution depends on the read vs. write ratio of the data, and the rate of “overwrite”. Storing BLOB data solely within the database (e.g., using the varbinary (max) data type) is limited to 2 gigabytes (GB) per BLOB.

Answering your questions :

What's the difference between inserting the .PDF (for example) using just a VARBINARY(MAX) column in the database, and storing this .PDF in a FILESTREAM database. I know with the BLOB varbinary(max) the .PDF is INSIDE the database. If I physically delete the .PDF, can I still restore it using the value stored inside the varbinary(MAX) column right?

Once you insert the pdf into the database as SINGLE_BLOB, you dont need the original .pdf file. You can export out if you need it at later time.

When I insert this .PDF file in a filestream, I can see it inside the folder (the one created for the filegroup), if I insert 100 files, I can see 100 files inside the filestream folder, but when I run DELETE FROM and delete 100 rows in the database, I can still see these files in the filestream folder. Why does this happen? I thought that, to maintain consistency, these files would be deleted too.

File Streams uses a garbage collection process to clean up files that are not longer needed. A system task wakes up periodically (~10 seconds or so) and checks for file stream garbage collection needs. Read "Why don't all the files get collected?"

DANGER: Deleting a file directly from the file stream container is considered database corruption and dbcc checkdb will report corruption errors. These files are linked to the database in a transactional way and the deletion of a file is the same as a damaged page within the database.

-

With both cases, after inserting the .PDF file inside the BLOB VARBINARY(MAX) or FILESTREAM, do I still need the original file?

For BLOB Varbinary(max) --> You can delete the original file. For Filestream, do not delete the file from the filestream folder. If the original file is not in the same folder than the filestream folder, then you can delete it.

References :