SQL Server – Moving or Redistributing FILESTREAM Files Between Containers

filestreamsql serversql-server-2012

We have a FILESTREAM container with a few million files, and we believe it's the cause of our performance issues we're experiencing (abundant timeouts).

According to this blog about FILESTREAM best practices, each container shouldn't reach more than 300,000 files.

According to the accepted answer here, there is no way to accomplish it except by recreating the table to a new FILESTREAM location.

  1. Is this true for my situation?
  2. And if so, what is the recommended approach to ensuring that enough containers are created automatically to handle this without manual intervention every 300,000 files?

The structure of the table follows:

CREATE TABLE [dbo].[Documents](
    [ContentPath] [uniqueidentifier] ROWGUIDCOL  NOT NULL,
    [FileContent] [varbinary](max) FILESTREAM  NULL,
 CONSTRAINT [UQ_IX_Documents_ContentPath] UNIQUE NONCLUSTERED 
(
    [ContentPath] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY] FILESTREAM_ON [FSFileGroup_1]

We're using SQL Server 2012. Not Enterprise unfortunately (which we only recently realized supports multiple containers per filegroup).

Although we never do record updates, we do a lot of writes, probably as many as we do reads. The pattern is: just one at a time, reads by ContentPath, and writes in no specific container or order.

Best Answer

A good option for you, I would imagine, is to split table logically, so every section would be in its own folder and have under 300k files(=records).

In SQL Server logical table split is usually done with Partitioning (this is an Enterprise feature). Partitioning essentially maps sections of table (partitions) to physical storage (filegroups). Various partitions can be assigned to different filegroups. FILESTREAM is essentially a property of a filegroup.

Design

This makes up the spine of your solution:

  • Partitions reside on different filegroups
  • Each filegroup is a separate FILESTREEAM folder/container
  • Size of each partition under 300k records

Here is how to partition a table with FILESTREAM column.

Main advantage: you will be able to manage your FILESTREAM data per Partition at a time, not whole table.

You'll need:

  1. ID that would allow to track size of a partition or at least not allow random writes across partitions

    [Id] [uniqueidentifier] ROWGUIDCOL NOT NULL DEFAULT (newsequentialid())
    
  2. Maintenance job, that checks how full last partition is, creating new filegroups and partitions in advance, rebuilding indexes per partition etc.

  3. To cater for Insert operations make you partitions smaller than 300k, so index is rebuild one partition at a time.

This design is for mostly read-only table with inserts happening at the end of it. Do consider writes and updates and impact of partitioning on them, if you have not done so already.

Migration

In order to make a table partitioned, you need to build an index. There 2 main ways:

  1. Rebuild index for the whole table. It will requires x2-x3 times more space than the table. It is not suitable for big tables for a number of reasons.

  2. Migrate data into new table, which is already partitioned.

My preferred method for this:

  • Create a table of the same structure as target partitioned table.
  • Insert data using DELETE... OUTPUT DELETED ... INTO <intermediary table> for data that fits 1 partition.
  • If number of rows is too big for one transaction, then wrap this into a loop
  • Create index on Partitioned scheme
  • Switch partition to Target table
  • Drop index on Intermediary table
  • Wrap in a loop and repeat for all partitions/data

Inserts

Newly inserted rows ideally should go into last partition. However, there is a catch about NEWSEQUENTIOALID() function:

Creates a GUID that is greater than any GUID previously generated by this function on a specified computer since Windows was started. After restarting Windows, the GUID can start again from a lower range, but is still globally unique.

It means that new writes will go into last partition only until first reboot of the server. And this will eventually happen. New rows will be inserted in the middle of the table. But remember we have partitions now? Only one partition at a time will be affected. Still it will exceed 300k rows at some point in time.

Main option here is to split partition. Existing partition will split in 2, where right part will form a new partition. New partition should go onto new filegroup. It will not be a metadata only operation: files will be physically copied into another partition.

A work around would be moving (delete and insert rows into Intermediary table) rows on same PS; splitting partition and moving rows back. But it involves moving data once anyway. If intermediary table in on same PS, then split will involve moving data. If not, then data needs to be moved second time back into main table.

Multiple FILESTREAM containers

Multiple FILESTRAM containers are a feature of Enterprise version. A container is often referred to as a "folder" in FILESTREAM container. Actually it is a database file in a filegroup of FILESTREAM type.

It does help to stay under 300k files per NTFS folder, but it is very difficult to manage stored files across these folders explicitly.

Article Rebalancing data across files in a filegroup from 2011 by Paul Randall gives a hint on use of files in a filegroup:

[SQL Server] also uses an algorithm called proportional fill that aims to allocate data from the files according to how much free space they have relative to the other files in the filegroup.

...

This means that if you add a new data file to a filegroup that has mostly full data files, the proportional fill weightings will be such that the new file will be the one where allocations come from until it fills up to the same level as the older files. The new file in essence becomes an allocation hot spot.

SQL Server will be filling newly added files first until they fill up to the same %age level as others.

It may look like an easier alternative to Partitioning, however it is not:

  1. You cannot do anything about files already stored in existing containers.
  2. You still have one big table that you need to manage within single transaction

At the same time you can blend multiple FILESTREAM containers into Partitioning solution to make folders smaller and faster.

UPDATE for Standard Edition (SE)

SE does not have Partitioning nor Multiple FILESTREAM containers. It means we can have only one partition and one container per table.

One option will be to have a number of tables with a UNION ALL view on top.

  • Number of tables will fixed for most of the time. You already have quite a number of documents, so you can look at distribution of GUIDs and creates tables for smaller intervals for higher density at the lower end.
  • Create a view that will UNION ALL all tables. Hint: in order to avoid scans of all tables add a WHERE clause for IDs belonging to that particular table. it will help Optimizer to exclude tables there a particular does not exist:

    Select * from Table1 where ID between 1 and 100 
    UNION ALL
    Select * from Table2 where ID between 101 and 10000 
    UNION ALL
    Select * from Table3 where ID between 10001 and 100000000 

Make sure that you cover all possible values of GUID from 0000-...-0000 to FFFF-...-FFFF - This view will updatable, so you will be able to insert data into relevant table by using this view. There are imitations to this: Modifying Data Through a View. - You need to copy all data physically from existing table to new tables. it can be achieved via the master view.

P.S. SQL Server is a RDMS first of all. Features to manage huge volumes of unstructured data like VARBINARY(MAX) are add-ons and relatively new. Therefore there are other limitations that need to be taken into account:

Max number of partitions per table: 15 000 Max files groups OR files per database: 32 767 Max files per container: 300 000 (=filegroup in my example, =database file in case of multiple containers per filegroup)

Size of transaction log is a limitation for one transaction as well as time that a transaction takes to complete. You need to keep individual operations manageable and possible to roll back in reasonable time.