Sql-server – Error on SELECT – Primary filegroup is full

etlsql-server-2008-r2

A SQL Server 2008 R2 database view unions several transaction table. The query has been slow over the couple of months and the dev team is planning to recreate the query. I need to ETL the results into a flat file and purge them from the DB. When I tried to query the view (of 5M records) this error showed up:

Msg 1105, Level 17, State 2, Line 1
Could not allocate space for
object 'dbo.SORT temporary run storage: 140744470495232' in database
'tempdb' because the 'PRIMARY' filegroup is full. Create disk space by
deleting unneeded files, dropping objects in the filegroup, adding
additional files to the filegroup, or setting autogrowth on for
existing files in the filegroup.

What is the recommended file growth I should set, how is this error managed or solved?
Thanks in advance.

Best Answer

The quickest fix would be to restart the SQL Server and Tempdb will be recreated with default size and empty files.

But if it's a production server you can't really restart it when you want. A real fix would be to add a new file on a different drive and run your queries.

An example would be (new file of starting size 1 MB, increase 100 MB, limit 500 MB):

USE [master]
GO
ALTER DATABASE [tempdb] ADD FILE ( NAME = N'newtempfile', FILENAME = N'e:\newtempfile.ndf' , SIZE = 100MB, MAXSIZE = 500MB , FILEGROWTH = 100MB )
GO

And then, when you have time, check what's using Tempdb so much. But likely you'll still need space for TempDB, so you'd better plan space for this and assign enough space for this db, as it's very important for a healthy system (you can simply look at it as a RAM part).

PS1: check if you don't have a Cartesian product anywhere in your select statements, because that number seems a bit high.

PS2: If you have enough free space on the TempDB drive, check if the files didn't get to their limit and autogrowth is disabled. If yes, enable autogrowth (not by percent, but by some specific size you feel comfortable with).

PS3: a good solution would be to break that ETL process into smaller transactions. Instead of removing 1 Bil records at once, do it by 1000 mill .. or play with batch sizes until you feel safe with the length of the process and the size of the files. You'll likely obtain the result faster and not have huge increases in space at once.

Related Solutions

Sql-server – Restore database excluding FILESTREAM data

What you're trying to do would leave the database in a (transactionally) inconsistent state, hence it isn't possible.

The Partial Database Availability whitepaper is a useful reference guide and includes an example of how to check whether a particular table or file is online. If your data access were through stored procedures, you could relatively easily incorporate that check.

One alternative (but somewhat hacky) approach that might be worth a look in your scenario would be to hide the table and replace it with a view.

-- NB: SQLCMD script
:ON ERROR EXIT
:setvar DatabaseName "TestRename"
:setvar FilePath "D:\MSSQL\I3\Data\"

SET STATISTICS TIME OFF;
SET STATISTICS IO OFF;
SET NOCOUNT ON;
GO

USE master;
GO

IF EXISTS (SELECT name FROM sys.databases WHERE name = N'$(DatabaseName)')
  DROP DATABASE $(DatabaseName)
GO

CREATE DATABASE $(DatabaseName) 
ON PRIMARY 
  (
  NAME = N' $(DatabaseName)'
  , FILENAME = N'$(FilePath)$(DatabaseName).mdf'
  , SIZE = 5MB
  , MAXSIZE = UNLIMITED
  , FILEGROWTH = 1MB
  ) 
, FILEGROUP [FG1] DEFAULT
  ( 
  NAME = N' $(DatabaseName)_FG1_File1'
  , FILENAME = N'$(FilePath)$(DatabaseName)_FG1_File1.ndf'
  , SIZE = 1MB
  , MAXSIZE = UNLIMITED
  , FILEGROWTH = 1MB 
  ) 
, FILEGROUP [FG2] CONTAINS FILESTREAM
  ( 
  NAME = N'$(DatabaseName)_FG2'
  , FILENAME = N'$(FilePath)Filestream'
  )
LOG ON 
  ( 
  NAME = N'$(DatabaseName)_log'
  , FILENAME = N'$(FilePath)$(DatabaseName)_log.ldf'
  , SIZE = 1MB
  , MAXSIZE = UNLIMITED
  , FILEGROWTH = 1MB
  )
GO

USE $(DatabaseName);
GO

CREATE TABLE [dbo].[BinaryContent](
    [BinaryContentID] [int] IDENTITY(1,1) NOT NULL
    , [FileName] [varchar](50) NOT NULL
    , [BinaryContentRowGUID] [uniqueidentifier] ROWGUIDCOL UNIQUE DEFAULT (NEWSEQUENTIALID()) NOT NULL
  , [FileContentBinary] VARBINARY(max) FILESTREAM  NULL
) ON [PRIMARY] FILESTREAM_ON [FG2]
GO 

-- Insert test rows
INSERT
  dbo.BinaryContent
  (
  [FileName]
  , [FileContentBinary]
  )
VALUES
  (
  CAST(NEWID() AS VARCHAR(36))
  , CAST(REPLICATE(NEWID(), 100) AS VARBINARY)
  );
GO 100

USE master;
GO

-- Take FILESTREAM filegroup offline
ALTER DATABASE $(DatabaseName)
MODIFY FILE (NAME = '$(DatabaseName)_FG2', OFFLINE)
GO

USE $(DatabaseName);
GO

-- Rename table to make way for view
EXEC sp_rename 'dbo.BinaryContent', 'BinaryContentTable', 'OBJECT';
GO

-- Create view to return content from table but with NULL FileContentBinary
CREATE VIEW dbo.BinaryContent
AS

SELECT
  [BinaryContentID]
    , [FileName] 
    , [BinaryContentRowGUID]
  , [FileContentBinary] = NULL
FROM
  [dbo].[BinaryContentTable];
GO

-- Check results as expected
SELECT TOP 10
  *
FROM
  dbo.BinaryContent;
GO

Sql-server – Bulk Data Loading and Transaction Log

My first comment is that you are doing an ELT (Extract, Load, Transform) rather than an ETL (Extract, Transform, Load). While ELTs leverage set based relational advantages and can be very fast, they are sometimes very write intensive (hard on storage). Specifically, the t-log. This is because the transform is done on disk (typically an update or insert). I prefer ETL when possible, as the transform is done in the buffer and, when done correctly, requires minimal t-log writes. Buffer is cheap. Fast storage is not. For some bulk operations, the t-log is a non-value adding bottleneck.

Here are a few things that you're doing but I wouldn't recommend.

Bulk loading to tempdb. I'd recommend doing the bulk load on a real table in the destination database. Then you can size you're files accordingly and not worry about impacting tempdb.
Bundling independent procedures together. Split these two procedures up. The bulk load and the merge are independent of each other. Splitting them into individual procedures makes them more modular / unit testable.

It looks like you have the minimal logging rules covered pretty well. You're loading to an empty B-Tree with no non-clustereds, using tf 610, the ordering key specified, in bulk-logged mode. Outside of the temp table, everything looks ok here. As long as the file is actually ordered by the key, you should be good. Are you popping the log on tempdb or the user database?

On the merge statement:

UPDATES will always be fully logged. Are you changing a pretty significant portion of your table? If so, you might consider doing the merge in memory (SSIS data flow task or .Net) then bulk loading into a new table. This is more work, but most of the work is done in the buffer and minimal t-log is used. A minimally logged insert can be faster than a fully logged update if the portion of change is significant.

Since you're using tf 610, the insert can minimally log when using a tablock hint. See here for more info on merge with tablock: http://blogs.msdn.com/b/sqlserverstorageengine/archive/2010/06/03/minimal-logging-and-merge-statement.aspx Note, the update will still be fully logged if you go this route.

Best Answer

Related Solutions

Sql-server – Restore database excluding FILESTREAM data

Sql-server – Bulk Data Loading and Transaction Log

Related Question