Sql-server – Query to find optimal filegroups for objects

filegroupssql server

I have been assigned a task to create four file groups (one per volume) locate tables so that they are located in a manner that during an ETL process no filegroup is being read and written to at the same time.

I have stepped through all the packages, data flow tasks and procedures and views and noted which tables are being read from and to and in what sequence.

I now need write a query to determine the optimal positioning of the tables.

In terms of logic I have the following, broken it down into the following rules.

For each sequence source table and destination table can not be on the same volume.
Tables can not exist on more than one volume.

I am interested to hear peoples opinion on whether this process is worthwhile but regardless I would appreciate some help resolving the query.

Below you can find the scripts to create the table and insert some test data.

Thanks in advance for any suggestions.

-- Create table
CREATE TABLE [dbo].[TestData]
(
    [TestDataID] [smallint] IDENTITY(1,1) NOT NULL,
    [Src] [nvarchar](50) NOT NULL,
    [Dest] [nvarchar](50) NOT NULL,
    [Seq] [int] NOT NULL,
CONSTRAINT PK_TestData PRIMARY KEY CLUSTERED (TestDataID)
)
GO

-- Insert data
INSERT INTO TestData(Src, Dest, seq)
 SELECT 'A', 'B', 1
 UNION ALL
 SELECT 'A', 'C', 1
 UNION ALL
 SELECT 'C', 'D', 2
 UNION ALL
 SELECT 'A', 'D', 3
 UNION ALL
 SELECT 'B', 'D', 3
GO

Best Answer

Bad news. You said ETL process - that means heavy data loads, and inserts/updates/deletes have a single disk bottleneck: the transaction log file. You can add multiple transaction log files, but they're used sequentially, not load-balanced (except in some really edge-case scenarios.)

Instead, what you need is multiple databases. In data warehouses, you typically use a different source database than a destination database, like a staging database. That way, you can put its databases on different drives (and even different classes of drives). For example, some staging databases live on cheap local SSDs.

If you do find yourself storage-bottlenecked on data file reads or writes, you're better off solving that problem by adding memory (to cache the data and avoid reads altogether) or by striping the data across more drives, not less. By splitting the data files into smaller pieces, and giving them each fewer drives, you're usually making the problem worse.