Sql-server – What to do when you need a cursor and you have a columnstore clustered index

columnstorecursorssql serversql server 2014

I have a very large SQL Server table with a ColumnStore clustered index. The table is very fast to query, so this is good. However I need to do some updates to a large portion of records in the table. When I tried a conventional update statement, my transaction log filled up. Normally I would handle this by using a cursor to use an additional part of the composite key so I can update batches and run the update statements in a loop. However I have just learned via the error message that this won't work on a ColumnStore table with a clustered index.

Can someone please suggest an alternative method for updating the data?

Should I just turn transaction logging off? That worries me.

One thing I can't do is add more space to my transaction log. It's already quite large.

declare @ig as cursor;
declare @pname varchar(200)

set @ig = cursor for 
select distinct injection_group from mytable where 
ProductGroup = 'Widget'
and target_month = '1/1/2018'
open @ig;

fetch next from @ig into @pname;

while @@FETCH_STATUS = 0
begin
    print @pname;
    update 
    mytable
    set injection_group = replace(injection_group, '  ', ' ')
    where productgroup = 'Widget'
    and target_month = '1/1/2018'
    and injection_group = @pname
    fetch next from @ig into @pname;
end

close @ig;
deallocate @ig;

Msg 35370, Level 16, State 1, Line 5
Cursors are not supported on a table which has a clustered columnstore index.

Best Answer

This answer is written from the point of view of SQL Server 2016, but I believe that most of the important details are the same.

Updates are extremely expensive from a transaction logging point of view. They are implemented behind the scenes as a delete + insert. The delete has a fixed transaction log cost per row. If I remember correctly it's about 200 MB per 1 million rows deleted. The insert can only go to the delta store, so the data is written as heaps. These heaps are uncompressed in 2016 and page-compressed in 2014.

It's a best practice to do delete + inserts instead of updates against columnstore indexes. If you're truly processing most of the rows in the table you could just insert all of the new data into a new copy and switch the tables when you're done. Nearly all of the inserted rows will bypass the delta store and be written to compressed rowgroups instead. Writing data to compressed rowgroups requires significantly fewer transaction writes compared to writing to the delta store.

In SQL Server 2016 you can significantly improve performance with such an approach. The same is true for SQL Server 2014 according to Niko's article.

Related Solutions

Sql-server – SQL Server using index scan instead of seek when updating a view that has an INSTEAD OF UPDATE trigger via a cursor

This looks like an oversight.

The optimizer cannot generate an apply-style indexed loops join when the update is performed using WHERE CURRENT OF and the target view (schema-bound or not) has a T-SQL instead-of update trigger, regardless of the number of rows, or any other considerations:

That shows an example with almost 20,000 rows in the table (copied from the AdventureWorks' Person table, as it happens).

The join predicate gets 'stuck' on the nested loops join operator itself, rather than being pushed into the inner side to produce a seek:

Since you cannot change the code, you should report this as a bug through your normal Microsoft Support channel. You may also report the bug on Connect, but the chances are much lower of getting a quick response or fix via that route.

Just for interest's sake, the plan you are after is possible using an API cursor positioned update (the most analogous operation internally):

DECLARE 
    @cur integer,
    @scrollopt integer = 2 | 8192 | 32768 | 131072, -- DYNAMIC | AUTO_FETCH | CHECK_ACCEPTED_TYPES | DYNAMIC_ACCEPTABLE
    @ccopt integer = 2 | 32768 | 131072, -- SCROLL_LOCKS | CHECK_ACCEPTED_OPTS | SCROLL_LOCKS_ACCEPTABLE
    @rowcount integer = 1;

-- Open the cursor
EXECUTE sys.sp_cursoropen
    @cur OUTPUT,
    N'
    SELECT * FROM dbo.vPerson WHERE Id = 1;
    ',
    @scrollopt OUTPUT,
    @ccopt OUTPUT,
    @rowcount OUTPUT;

-- Request a positioned update
EXECUTE sys.sp_cursor
    @cur,
    1, -- UPDATE
    1, -- row number in buffer
    'dbo.vPerson', -- table (unambiguous in this case)
    'Name=''Banana'''; -- new value

-- Close    
EXECUTE sys.sp_cursorclose -1;

The execution plan is:

Notice the index seek on Person (the predicate is not 'stuck'):

This is not a workaround for you since you cannot change the source query. There's no way to hint or plan guide your way around your issue; the optimizer simply cannot generate the seek plan you expect in your specific case. Attempting e.g. a FORCESEEK hint simply results in an error message saying the optimizer could not produce an execution plan.

Sql-server – Columnstore index and low selective columns

One good feature of columnstore indexes is that only the columns which are necessary are read (unlike rowstores where the entire row is read).

So if you include all columns in the non-clustered index this will take longer to create the index but not adversely impact any query (and will benefit any query that uses those columns).

Best Answer

Related Solutions

Sql-server – SQL Server using index scan instead of seek when updating a view that has an INSTEAD OF UPDATE trigger via a cursor

Sql-server – Columnstore index and low selective columns

Related Question