Azure SQL – Mitigating Tempdb Filling Up During Large Table Conversion to Columnstore

azureazure-sql-databasecolumnstoresql servertempdb

I have a large reporting table (approx 6B rows) taking up roughly 400GB of storage on an azure sql db. I've been trying to convert it to a columnstore table using the following commands:

insert Logs(Message) values ('Creating SRR table...');
select top 0 * into temp.SRR from dbo.SRR (nolock);
create clustered columnstore index ix_SRR on temp.SRR with (online = off);
create index ix_SRR_JobId on temp.SRR (JobId);
insert Logs(Message) values('Populating SRR table...');
insert into temp.SRR with (tablock) select * from dbo.SRR (nolock);
insert Logs(Message) values ('Switching out SRR table...');
alter schema old transfer dbo.SRR;
alter schema dbo transfer temp.SRR;
insert Logs(Message) values ('SRR table converted.');
if (select count_big(*) from old.SRR (nolock)) = (select count_big(*) from dbo.SRR (nolock)) begin
    drop table old.SRR;
    insert Logs(Message) values('Deleted old SRR table.');
end else begin
    insert Logs(Message) values('Row counts between old.SRR and dbo.SRR do not match; retaining old.SRR.');
end

This has worked for all of our other large reporting tables, but this one (after a good 30 hours of DTU time) consistently fails with the message:

Msg 40544, Level 17, State 2, Line 195
The database 'tempdb' has reached its size quota. Partition or delete data, drop indexes, or consult the documentation for possible resolutions.

What can I do to make this work?

Best Answer

Bulk inserting into a CLUSTERED COLUMNSTORE index by itself shouldn't be blowing up tempdb. Rows are read and put into compressed rowgroups in chunks of 1 million, there is no reason for long term tempdb usage.

I believe the problem is the non-clustered index ix_SRR_JobId on temp.SRR (JobId);

First try moving the index creation to after the bulk insert:

insert Logs(Message) values('Populating SRR table...');
insert into temp.SRR with (tablock) select * from dbo.SRR (nolock);
create index ix_SRR_JobId on temp.SRR (JobId) WITH (SORT_IN_TEMPDB = OFF);

This will substantially reduce tempdb pressure.

If tempdb is still blowing up, then try loading the data in smaller chunks. How you partition the data doesn't really matter, but, it should be based on the leading column(s) of an existing clustered index (unique or not). The chunks should be small enough to load but large enough to avoid excessive "delta rowgroups" (Columnstore Indexes: Data Loading Guidance).

As a starting point, aim for 10-20 chunks of equal size. If the base table is 2 TB, then these would be 100-200 GB. Adjust up or down until you get it to work.

For example, if column1 is a datetime, then try loading by year or quarter or month.

Related Solutions

SQL Server – How to Identify Query Filling Tempdb Transaction Log

From http://www.sqlservercentral.com/scripts/tempdb/72007/

;WITH task_space_usage AS (
    -- SUM alloc/delloc pages
    SELECT session_id,
           request_id,
           SUM(internal_objects_alloc_page_count) AS alloc_pages,
           SUM(internal_objects_dealloc_page_count) AS dealloc_pages
    FROM sys.dm_db_task_space_usage WITH (NOLOCK)
    WHERE session_id <> @@SPID
    GROUP BY session_id, request_id
)
SELECT TSU.session_id,
       TSU.alloc_pages * 1.0 / 128 AS [internal object MB space],
       TSU.dealloc_pages * 1.0 / 128 AS [internal object dealloc MB space],
       EST.text,
       -- Extract statement from sql text
       ISNULL(
           NULLIF(
               SUBSTRING(
                 EST.text, 
                 ERQ.statement_start_offset / 2, 
                 CASE WHEN ERQ.statement_end_offset < ERQ.statement_start_offset 
                  THEN 0 
                 ELSE( ERQ.statement_end_offset - ERQ.statement_start_offset ) / 2 END
               ), ''
           ), EST.text
       ) AS [statement text],
       EQP.query_plan
FROM task_space_usage AS TSU
INNER JOIN sys.dm_exec_requests ERQ WITH (NOLOCK)
    ON  TSU.session_id = ERQ.session_id
    AND TSU.request_id = ERQ.request_id
OUTER APPLY sys.dm_exec_sql_text(ERQ.sql_handle) AS EST
OUTER APPLY sys.dm_exec_query_plan(ERQ.plan_handle) AS EQP
WHERE EST.text IS NOT NULL OR EQP.query_plan IS NOT NULL
ORDER BY 3 DESC;

EDIT

As Martin pointed out in a comment, this would not find active transactions that are occupying space in tempdb, it will only find active queries that are currently utilizing space there (and likely culprits for current log usage). So there could be an open transaction but the actual query that causes the problem is no longer running.

You could change the inner join on sys.dm_exec_requests to a left outer join, then you will return rows for sessions that aren't currently actively running queries.

The query Martin posted...

SELECT database_transaction_log_bytes_reserved,session_id 
  FROM sys.dm_tran_database_transactions AS tdt 
  INNER JOIN sys.dm_tran_session_transactions AS tst 
  ON tdt.transaction_id = tst.transaction_id 
  WHERE database_id = 2;

...would identify session_ids with active transactions that are occupying log space, but you wouldn't necessarily be able to determine the actual query that caused the problem, since if it's not running now it won't be captured in the above query for active requests. You may be able to reactively check the most recent query using DBCC INPUTBUFFER but it may not tell you what you want to hear. You can outer join in a similar way to capture those actively running, e.g.:

SELECT tdt.database_transaction_log_bytes_reserved,tst.session_id,
       t.[text], [statement] = COALESCE(NULLIF(
         SUBSTRING(
           t.[text],
           r.statement_start_offset / 2,
           CASE WHEN r.statement_end_offset < r.statement_start_offset
             THEN 0
             ELSE( r.statement_end_offset - r.statement_start_offset ) / 2 END
         ), ''
       ), t.[text])
     FROM sys.dm_tran_database_transactions AS tdt
     INNER JOIN sys.dm_tran_session_transactions AS tst
     ON tdt.transaction_id = tst.transaction_id
         LEFT OUTER JOIN sys.dm_exec_requests AS r
         ON tst.session_id = r.session_id
         OUTER APPLY sys.dm_exec_sql_text(r.plan_handle) AS t
     WHERE tdt.database_id = 2;

You can also use the DMV sys.dm_db_session_space_usage to see overall space utilization by session (but again you may not get back valid results for the query; if the query is not active, what you get back may not be the actual culprit).

;WITH s AS
(
    SELECT 
        s.session_id,
        [pages] = SUM(s.user_objects_alloc_page_count 
          + s.internal_objects_alloc_page_count) 
    FROM sys.dm_db_session_space_usage AS s
    GROUP BY s.session_id
    HAVING SUM(s.user_objects_alloc_page_count 
      + s.internal_objects_alloc_page_count) > 0
)
SELECT s.session_id, s.[pages], t.[text], 
  [statement] = COALESCE(NULLIF(
    SUBSTRING(
        t.[text], 
        r.statement_start_offset / 2, 
        CASE WHEN r.statement_end_offset < r.statement_start_offset 
        THEN 0 
        ELSE( r.statement_end_offset - r.statement_start_offset ) / 2 END
      ), ''
    ), t.[text])
FROM s
LEFT OUTER JOIN 
sys.dm_exec_requests AS r
ON s.session_id = r.session_id
OUTER APPLY sys.dm_exec_sql_text(r.plan_handle) AS t
ORDER BY s.[pages] DESC;

With all of these queries at your disposal, you should be able to narrow down who is using up tempdb and how, especially if you catch them in the act.

some tips for minimizing tempdb utilization

use fewer #temp tables and @table variables
minimize concurrent index maintenance, and avoid the SORT_IN_TEMPDB option if it isn't needed
avoid unnecessary cursors; avoid static cursors if you think this may be a bottleneck, since static cursors use work tables in tempdb
try to avoid spools (e.g. large CTEs that are referenced multiple times in the query)
don't use MARS
thoroughly test the use of snapshot / RCSI isolation levels - don't just turn it on for all databases since you've been told it's better than NOLOCK (it is, but it isn't free)
in some cases, it may sound unintuitive, but use more temp tables. e.g. breaking up a humongous query into parts may be slightly less efficient, but if it can avoid a huge memory spill to tempdb because the single, larger query requires a memory grant too large...
avoid enabling triggers for bulk operations
avoid overuse of LOB types (max types, XML, etc) as local variables
keep transactions short and sweet
don't set tempdb to be everyone's default database -

You may also consider that your tempdb log usage may be caused by internal processes that you have little or no control over - for example database mail, event notifications, query notifications and service broker all use tempdb in some way. You can stop using these features, but if you're using them you can't dictate how and when they use tempdb.

Sql-server – Oracle GoldenGate add trandata errors

I found out what the problem is, it seems that GoldenGate doesn't work with SQL Express. The server I was connecting to is SQL Express, I'll need to use the Enterprise Edition.

Best Answer

Related Solutions

SQL Server – How to Identify Query Filling Tempdb Transaction Log

some tips for minimizing tempdb utilization

Sql-server – Oracle GoldenGate add trandata errors

Related Question