Sql-server – Nonclustered Index Rebuild Online on Heap Table

enterprise-editionheapnonclustered-indexsql serversql-server-2012

Recently read that rebuilding a heap online in SQL Server Enterprise is single threaded and does not go parallel.

My question is: If you try to rebuild online just non-clustered indexes on a heap table with those also be single threaded or can those go parallel?

Background

I am trying to find the best place to actually test it, but right now I have limited hardware available to me. I am new to the company and I am not advocating a heap, I am trying to steer them towards clustered indexes, but they have random GUIDs everywhere and lots of the developers think clustered indexes will hurt performance. I am trying to educate them on the advantages of using clustered indexes rather than heaps and one of those components is index maintenance.

We had one stored procedure that was doing over 27 million logical reads. I went in and looked at fragmentation on one of the key table scans, rebuilt the heap and logical reads went down to 560,000. Just looking at forwarded records and fragmentation I would not have even rebuilt that table through a scripted job. Trying to make them understand if we are gonna keep heaps we need to perform regular maintenance on these things.

The cost of this maintenance is higher for rebuilding heaps than having clustered indexes. That is really where my question about single threaded is coming from. Sorry to be long winded but thought would help explain question.

Best Answer

My question is: If you try to rebuild online just non-clustered indexes on a heap table with those also be single threaded or can those go parallel?

It can be multi threaded, the restriction is ONLY to alter table rebuild command when done with ONLINE option. I would like to quote from Enabling Compression on Heap MSDN Blog

(1) If you use ONLINE option then HEAP rebuild is single threaded. Note, this restriction is not there when you are rebuilding an index which supports multi-threaded ONLINE build. However, in both cases, OFFLINE build does support multi-threading.

A non-clustered index rebuild can be multi threaded. Please note for index rebuild to use multiple threads the edition should be Enterprise edition or above( if there is such a edition). This is still true from SQL Server 2005 to SQL Server 2019.

For example, rebuilding a heap with three nonclustered indexes online:

ALTER TABLE dbo.Fish REBUILD WITH (ONLINE = ON);

The heap rebuild is serial:

...while the associated nonclustered index rebuilds are parallel:

Related Solutions

Sql-server – Why not rebuild indexes with page count <1000

The guidance concerning the minimum number of pages is somewhat arbitrary. The biggest benefits of reducing fragmentation are:

It can improve read-ahead performance for large range scans; and
It may improve the page density (number of rows per page)

Both of these factors are less important for small indexes, by definition.

The counter-argument to rebuilding small indexes is essentially:

"Why bother? Don't you have more important things to worry about?".

That said, rebuilding/reorganizing is not free. It may be worth avoiding the extra effort and log generation in some cases (for example, if the log is shipped/copied across a WAN for any number of possible reasons - mirroring, availability groups, replication...). Also, unless (or even if, in some cases) you are rebuilding online, the rebuild may impact other concurrent processes through locking. Finally, for small indexes, rebuilding may not even reduce the fragmentation anyway, due to allocations from mixed extents (unless you are running with trace flag 1118 enabled).

If you still feel happier rebuilding these small indexes, and don't mind the consequences, then by all means change the value of the @PageCountLevel parameter passed to Ola's procedure.

See the PASS TV recording of Paul Randal's presentation on Index Fragmentation for all the details.

You might also like to watch Brent Ozar talk about Why Index Fragmentation doesn't matter in SQL Server.

Sql-server – Howto Identify Clustered Index Candidates

Original Post

Honestly, in your situation I'd just go ahead and cluster the [ID]. It is worrying that there appears to be no default on it, but at the same time, maybe if you're lucky, it's being populated with some form of NEWSEQUENTIALID call. If not, it isn't the end of the world, < 1MM rows shouldn't be a show stopper if the indexes / stats are being maintained.

Doing so will allow you to start focusing on reworking some filtered / include indexes and when you're done with that, maybe you'll have 4 sore spots instead of 12.

Any advice beyond that will probably result in "tip of the iceberg" classification for your problem - as you seem to be well aware, the question is really about abysmal design, rather than engine performance.

Update

I couldn't figure out a way to answer your comment clearly, so here's way too much information instead.

As it has been stated, if you have a thin, accessible, incremental columns, like an IDENTITY column, put a clustered index on that - it's a no-brainer.

However:

If no such column exists and you're looking at heaps with 10+ indexes on them ( and -still- ending up doing RID lookups!! ), just cluster the ID for now, even if it's a UNIQUEIDENTIFIER. You need, more than anything else, to clean up your storage utilization, and that's going to start with clustering -something-, if only to pull that forward fetch data back in line with rest of the row. Even if you immediately drop the clustered index afterwards, you'd at least have organized your heap for now, dramatically reducing forward pointers ( if not completely eliminating ) and ultimately reducing contention on your ( needlessly overworked ) I/O subsystem. There is literally no downside to doing this, so do it. Then you can go about figuring out why there 29 available indexes and queries still need to go RID matching.

Ignoring the index sprawl and focusing on just clustering your heaps, using a FILLFACTOR around 70 should be fine for dealing with UNIQUEIDENTIFIER values. This is because b-trees, the actual data structure SQL Server uses for indexes, tend towards around 69% node utilization under INSERT only operation of random values. This number is a mathematical truth a couple of pretty bright fellas, Lehman and Yao, proved during their research into the subject. Throwing DELETEs in sufficient quantities into the mix can lower the node utilization target, but as a general rule, a FILLFACTOR of 70 is a good place to start tuning non-sequential / random clustered indexes, since you're effectively telling SQL Server to not even bother trying to fill 30% of each page, since you won't be using that part anyway - because you won't be, mathematically.

In regards to potential page splits using a non-optimal clustering key, while it is a valid concern, if you happen to have access to a Sql Server 2012+ installation, you can follow along with this demonstration, starting with setting up an extended event session to track page splits:

DROP EVENT SESSION [TrackPageSplits]
ON  SERVER;
GO

CREATE EVENT SESSION [TrackPageSplits]
ON    SERVER
ADD EVENT sqlserver.transaction_log(
    WHERE operation = 11  -- LOP_DELETE_SPLIT 
      AND database_id = 2 -- Watch TempDB;
)
ADD TARGET package0.histogram(
    SET filtering_event_name = 'sqlserver.transaction_log',
        source_type = 0, -- Event Column
        source = 'alloc_unit_id');
GO

-- Start the Event Session Again
ALTER EVENT SESSION [TrackPageSplits]
ON SERVER
STATE=START;
GO

Once that's up and running, you can set up and seed a few tables to get an idea of the impact different kinds of clustering keys under differing fill factors can have on index density and the number of page splits inserts into them can cause. For the example, I'm using a good clustering example with INTEGER and IDENTITY under FILL_FACTOR 100, a terrible clustering example with UNIQUEIDENTIFIER and non-sequential NEWID() under FILL_FACTOR 100, and then three more non-sequential UNIQUEIDENTIFIER clustering keys under FILL_FACTORs of 75, 70 and 65, each seeded with 2.5MM records.

USE tempdb;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'PageSplitIdentity'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.PageSplitIdentity;
    CREATE TABLE dbo.PageSplitIdentity
    (
        PageSplitIdentity_PK    INTEGER IDENTITY( 1, 1 ) NOT NULL,
        Foo                     VARBINARY( 512 ) NOT NULL
    );

    ALTER TABLE dbo.PageSplitIdentity
    ADD CONSTRAINT PK__PageSplitIdentity
        PRIMARY KEY CLUSTERED ( PageSplitIdentity_PK )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 )
    ON  [PRIMARY];

    ALTER TABLE dbo.PageSplitIdentity
    ADD CONSTRAINT DF__PageSplitIdentity__Foo
        DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;

    SET NOCOUNT ON;
    DECLARE @i                      INTEGER = 0;
    WHILE ( @i < 2500000 )
    BEGIN
        INSERT INTO dbo.PageSplitIdentity DEFAULT VALUES;
        SET @i = @i + 1;
    END;
    SET NOCOUNT OFF;
END;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'PageSplitNewID'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.PageSplitNewID;
    CREATE TABLE dbo.PageSplitNewID
    (
        PageSplitNewID_PK       UNIQUEIDENTIFIER NOT NULL,
        Foo                     VARBINARY( 512 ) NOT NULL
    );

    ALTER TABLE dbo.PageSplitNewID
    ADD CONSTRAINT PK__PageSplitNewID
        PRIMARY KEY CLUSTERED ( PageSplitNewID_PK )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 100 )
    ON  [PRIMARY];

    ALTER TABLE dbo.PageSplitNewID
    ADD CONSTRAINT DF__PageSplitNewID__PageSplitNewID_PK
        DEFAULT NEWID() FOR PageSplitNewID_PK;

    ALTER TABLE dbo.PageSplitNewID
    ADD CONSTRAINT DF__PageSplitNewID__Foo
        DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;

    SET NOCOUNT ON;
    DECLARE @i                      INTEGER = 0;
    WHILE ( @i < 2500000 )
    BEGIN
        INSERT INTO dbo.PageSplitNewID DEFAULT VALUES;
        SET @i = @i + 1;
    END;
    SET NOCOUNT OFF;
END;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'PageSplitNewIDFillFactor75'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.PageSplitNewIDFillFactor75;
    CREATE TABLE dbo.PageSplitNewIDFillFactor75
    (
        PageSplitNewIDFillFactor75_PK
                                UNIQUEIDENTIFIER NOT NULL,
        Foo                     VARBINARY( 512 ) NOT NULL
    );

    ALTER TABLE dbo.PageSplitNewIDFillFactor75
    ADD CONSTRAINT PK__PageSplitNewIDFillFactor75
        PRIMARY KEY CLUSTERED ( PageSplitNewIDFillFactor75_PK )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 75 )
    ON  [PRIMARY];

    ALTER TABLE dbo.PageSplitNewIDFillFactor75
    ADD CONSTRAINT DF__PageSplitNewIDFillFactor75__PageSplitNewIDFillFactor75_PK
        DEFAULT NEWID() FOR PageSplitNewIDFillFactor75_PK;

    ALTER TABLE dbo.PageSplitNewIDFillFactor75
    ADD CONSTRAINT DF__PageSplitNewIDFillFactor75__Foo
        DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;

    SET NOCOUNT ON;
    DECLARE @i                      INTEGER = 0;
    WHILE ( @i < 2500000 )
    BEGIN
        INSERT INTO dbo.PageSplitNewIDFillFactor75 DEFAULT VALUES;
        SET @i = @i + 1;
    END;
    SET NOCOUNT OFF;
END;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'PageSplitNewIDFillFactor70'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.PageSplitNewIDFillFactor70;
    CREATE TABLE dbo.PageSplitNewIDFillFactor70
    (
        PageSplitNewIDFillFactor70_PK
                                UNIQUEIDENTIFIER NOT NULL,
        Foo                     VARBINARY( 512 ) NOT NULL
    );

    ALTER TABLE dbo.PageSplitNewIDFillFactor70
    ADD CONSTRAINT PK__PageSplitNewIDFillFactor70
        PRIMARY KEY CLUSTERED ( PageSplitNewIDFillFactor70_PK )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 70 )
    ON  [PRIMARY];

    ALTER TABLE dbo.PageSplitNewIDFillFactor70
    ADD CONSTRAINT DF__PageSplitNewIDFillFactor70__PageSplitNewIDFillFactor70_PK
        DEFAULT NEWID() FOR PageSplitNewIDFillFactor70_PK;

    ALTER TABLE dbo.PageSplitNewIDFillFactor70
    ADD CONSTRAINT DF__PageSplitNewIDFillFactor70__Foo
        DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;

    SET NOCOUNT ON;
    DECLARE @i                      INTEGER = 0;
    WHILE ( @i < 2500000 )
    BEGIN
        INSERT INTO dbo.PageSplitNewIDFillFactor70 DEFAULT VALUES;
        SET @i = @i + 1;
    END;
    SET NOCOUNT OFF;
END;
GO

IF NOT EXISTS ( SELECT  1
                FROM    sys.objects
                WHERE   name = 'PageSplitNewIDFillFactor65'
                    AND type = 'U' )
BEGIN
    --DROP TABLE dbo.PageSplitNewIDFillFactor65;
    CREATE TABLE dbo.PageSplitNewIDFillFactor65
    (
        PageSplitNewIDFillFactor65_PK
                                UNIQUEIDENTIFIER NOT NULL,
        Foo                     VARBINARY( 512 ) NOT NULL
    );

    ALTER TABLE dbo.PageSplitNewIDFillFactor65
    ADD CONSTRAINT PK__PageSplitNewIDFillFactor65
        PRIMARY KEY CLUSTERED ( PageSplitNewIDFillFactor65_PK )
    WITH ( DATA_COMPRESSION = PAGE, FILLFACTOR = 65 )
    ON  [PRIMARY];

    ALTER TABLE dbo.PageSplitNewIDFillFactor65
    ADD CONSTRAINT DF__PageSplitNewIDFillFactor65__PageSplitNewIDFillFactor65_PK
        DEFAULT NEWID() FOR PageSplitNewIDFillFactor65_PK;

    ALTER TABLE dbo.PageSplitNewIDFillFactor65
    ADD CONSTRAINT DF__PageSplitNewIDFillFactor65__Foo
        DEFAULT CONVERT( VARBINARY( 512 ), REPLICATE( 0x01, 512 ) ) FOR Foo;

    SET NOCOUNT ON;
    DECLARE @i                      INTEGER = 0;
    WHILE ( @i < 2500000 )
    BEGIN
        INSERT INTO dbo.PageSplitNewIDFillFactor65 DEFAULT VALUES;
        SET @i = @i + 1;
    END;
    SET NOCOUNT OFF;
END;
GO

For index density after the initial seeding, the following query is used:

SELECT  si.name, sips.avg_fragment_size_in_pages, 
        sips.page_count, sips.fragment_count, 
        sips.avg_fragmentation_in_percent,
        sips.avg_page_space_used_in_percent
FROM    sys.indexes si
CROSS APPLY sys.dm_db_index_physical_stats( DB_ID(), si.object_id, si.index_id, DEFAULT, 'DETAILED' ) sips 
WHERE   si.name IN ( 'PK__PageSplitIdentity', 'PK__PageSplitNewID', 
            'PK__PageSplitNewIDFillFactor75', 'PK__PageSplitNewIDFillFactor70', 'PK__PageSplitNewIDFillFactor65' )
    AND sips.index_level = 0;

And here are the density results ( notice the ~69% in the last column for the UNIQUEIDENTIFIERs ):

As you might expect, the non-sequential indexes are very, very fragmented after 2.5MM inserts and no maintenance. Moving on to check the page splits that occurred by checking the extended event we set up earlier, we can use this query:

SELECT  si.name, sc.split_count, si.fill_factor
FROM (  SELECT  allocation_unit_id = slot.value( '( value )[ 1 ]', 'BIGINT' ),
                split_count = slot.value( '( @count )[ 1 ]', 'BIGINT' )
        FROM (  SELECT  target_data = CONVERT( XML, target_data ) 
                FROM    sys.dm_xe_sessions xs
                INNER JOIN sys.dm_xe_session_targets xst
                    ON xs.address = xst.event_session_address
                WHERE   xs.name = 'TrackPageSplits'
                    AND xst.target_name = 'histogram' ) s
        CROSS APPLY s.target_data.nodes( 'HistogramTarget/Slot' ) n ( slot ) ) sc
INNER JOIN sys.allocation_units sau
    ON  sc.allocation_unit_id = sau.allocation_unit_id
INNER JOIN sys.partitions sp
    ON  sau.container_id = sp.partition_id
INNER JOIN sys.indexes si
    ON  sp.object_id = si.object_id
    AND sp.index_id = si.index_id;

The results?

13 page splits for the good cluster, over 12500 for each of the non-sequential. This is bad in all non-sequential cases, but as originally mentioned, maintaining the indexes is an important part of making this work, so let's clean them up.

ALTER INDEX PK__PageSplitIdentity
    ON  dbo.PageSplitIdentity REBUILD;

ALTER INDEX PK__PageSplitNewID
    ON  dbo.PageSplitNewID REBUILD;

ALTER INDEX PK__PageSplitNewIDFillFactor75
    ON  dbo.PageSplitNewIDFillFactor75 REBUILD;

ALTER INDEX PK__PageSplitNewIDFillFactor70
    ON  dbo.PageSplitNewIDFillFactor70 REBUILD;

ALTER INDEX PK__PageSplitNewIDFillFactor65
    ON  dbo.PageSplitNewIDFillFactor65 REBUILD;

And now:

Okay, with fresh indexes, the page splits are now gone, fragmentation is largely the same for each of the tables and they're conforming to their FILL_FACTORs. The non-sequential tables will put more pressure on I/O than the sequential one for size and density difference reasons, but will still perform acceptably. Now we can simulate some loads to these tables and see if page split related performance issues are likely to overcome the benefits of clustering the tables in the first place:

SET NOCOUNT ON;
DECLARE @i                      INTEGER = 0;
WHILE ( @i < 50000 )
BEGIN
    INSERT INTO dbo.PageSplitIdentity DEFAULT VALUES;
    SET @i = @i + 1;
END;
SET NOCOUNT OFF;
GO

SET NOCOUNT ON;
DECLARE @i                      INTEGER = 0;
WHILE ( @i < 50000 )
BEGIN
    INSERT INTO dbo.PageSplitNewID DEFAULT VALUES;
    SET @i = @i + 1;
END;
SET NOCOUNT OFF;
GO

SET NOCOUNT ON;
DECLARE @i                      INTEGER = 0;
WHILE ( @i < 50000 )
BEGIN
    INSERT INTO dbo.PageSplitNewIDFillFactor75 DEFAULT VALUES;
    SET @i = @i + 1;
END;
SET NOCOUNT OFF;
GO

SET NOCOUNT ON;
DECLARE @i                      INTEGER = 0;
WHILE ( @i < 50000 )
BEGIN
    INSERT INTO dbo.PageSplitNewIDFillFactor70 DEFAULT VALUES;
    SET @i = @i + 1;
END;
SET NOCOUNT OFF;
GO

SET NOCOUNT ON;
DECLARE @i                      INTEGER = 0;
WHILE ( @i < 50000 )
BEGIN
    INSERT INTO dbo.PageSplitNewIDFillFactor65 DEFAULT VALUES;
    SET @i = @i + 1;
END;
SET NOCOUNT OFF;

And the results:

The only table affected by page splits after another 50000 rows were inserted was the non-sequential UNIQUEIDENTIFIER under FILL_FACTOR 100. This absolutely could potentially cause performance issues. However, as shown, any of the other tested FILLFACTOR levels caused no additional page splits. I actually needed to insert another 0.5MM into the tables before I started getting page splits on the dbo.PageSplitNewIDFillFactor75 table during my testing, and that one was over the mathematical threshold anyway.

So, to reiterate, obviously, if a thin, sequential clustering key can be selected, it is the best choice. However, as I originally said, clustering a non-sequential key is a completely viable course of action, given no better option, if done responsibly. Your stated performance problems are not those of page splitting issues, they are forward fetch issues on heaps, which can be solved by adding a clustered index. So add one. If your heap has a non-clustered primary key that is a UNIQUEIDENTIFIER and that is being used throughout your software suite, cluster and FILL_FACTOR 70 that sucka, measure and adjust from there - the more time you spend putting out fires and explaining performance metrics, the less time you have to adjust the model to a more suitable form.

Don't forget to drop your extended event session!

DROP EVENT SESSION [TrackPageSplits]
ON  SERVER;
GO

Background

Best Answer

Related Solutions

Sql-server – Why not rebuild indexes with page count <1000

Sql-server – Howto Identify Clustered Index Candidates

Related Question