Sql-server – What to do when you need a cursor and you have a columnstore clustered index

columnstorecursorssql serversql server 2014

I have a very large SQL Server table with a ColumnStore clustered index. The table is very fast to query, so this is good. However I need to do some updates to a large portion of records in the table. When I tried a conventional update statement, my transaction log filled up. Normally I would handle this by using a cursor to use an additional part of the composite key so I can update batches and run the update statements in a loop. However I have just learned via the error message that this won't work on a ColumnStore table with a clustered index.

Can someone please suggest an alternative method for updating the data?

Should I just turn transaction logging off? That worries me.

One thing I can't do is add more space to my transaction log. It's already quite large.

declare @ig as cursor;
declare @pname varchar(200)

set @ig = cursor for 
select distinct injection_group from mytable where 
ProductGroup = 'Widget'
and target_month = '1/1/2018'
open @ig;

fetch next from @ig into @pname;

while @@FETCH_STATUS = 0
begin
    print @pname;
    update 
    mytable
    set injection_group = replace(injection_group, '  ', ' ')
    where productgroup = 'Widget'
    and target_month = '1/1/2018'
    and injection_group = @pname
    fetch next from @ig into @pname;
end

close @ig;
deallocate @ig;

Msg 35370, Level 16, State 1, Line 5
Cursors are not supported on a table which has a clustered columnstore index.

Best Answer

This answer is written from the point of view of SQL Server 2016, but I believe that most of the important details are the same.

Updates are extremely expensive from a transaction logging point of view. They are implemented behind the scenes as a delete + insert. The delete has a fixed transaction log cost per row. If I remember correctly it's about 200 MB per 1 million rows deleted. The insert can only go to the delta store, so the data is written as heaps. These heaps are uncompressed in 2016 and page-compressed in 2014.

It's a best practice to do delete + inserts instead of updates against columnstore indexes. If you're truly processing most of the rows in the table you could just insert all of the new data into a new copy and switch the tables when you're done. Nearly all of the inserted rows will bypass the delta store and be written to compressed rowgroups instead. Writing data to compressed rowgroups requires significantly fewer transaction writes compared to writing to the delta store.

In SQL Server 2016 you can significantly improve performance with such an approach. The same is true for SQL Server 2014 according to Niko's article.