Sql-server – fragmenting the index on a table with stable traffic

clustered-indexfragmentationindexsql server

I am using Ola Hallengren's solution for optimizing indexes. I run it on Sunday every week. The index has had low fragmentation for the last 6 months and it hasn't needed reorganization. The way it's being used hasn't changed either.

For the last three weeks every time it runs it reports that the clustered index on the database's biggest table is getting between 5-12% fragmented. Strangely enough this is not occurring during weekdays, so it must be there sometime before the maintenance job runs.

My table has a timestamp on every row so I know that the traffic has stayed on the same level for months. Usually it's under 1% fragmentation per month.

I have two questions:

  1. What else could be fragmenting my index?
  2. Is there an automated solution I can set up to track changes? Ideally something not disruptive since this is a production box.

The table has 134 columns contains 14gb worth of data, and the primary key is not an identity (sigh)

The index in question looks like this:

  • Average row size: 1413
  • Depth: 4
  • Leaf-level rows: 9884500
  • Maximum row size: 2303
  • Minimum row size: 746
  • Pages: 1904907
  • Partition ID: 1

    CREATE UNIQUE CLUSTERED INDEX [FOOINDEX] ON [dbo].[FOOTABLE]     
    (
       [FOONO] ASC,
    
       [ID] ASC
    )
    WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF,  IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, 
    
    ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
    

Best Answer

In General

Just to be clear at the start: fragmentation refers to when the next logical data page (i.e. the values in the fields for 1 or more rows), is not the next physical data page (the location of page in the data file).

Non-fragmented pages (in Physical Page order):

  1. Logical Page 1: A1, A2, A3, A4
  2. Logical Page 2: B1, B2, B3, B4
  3. Logical Page 3: C1, C2, C3, C4

Fragmented pages (in Physical Page order):

  1. Logical Page 1: A1, A2, A3, A4
  2. Logical Page 3: C1, C2, C3, C4
  3. Logical Page 2: B1, B2, B3, B4

In a general sense, fragmentation is caused by:

  • Data inserted out of order (the order of the index in question).
    • Using ever-increasing values (e.g. an IDENTITY field) is one way to mitigate this as rows will always be created at the end of the index (assuming an ASC sort order).
    • FILLFACTOR is another means of reducing fragmentation when not using an ever-increasing value. But unlike an ever-increasing value, FILLFACTOR is only a certain amount of space per data page and depending on what data comes in, it might not be enough space, in which case page splits will occur. And depending on the max size of the row, it might be that no amount of reserved space will help (if the rows are over 4030 bytes).
  • Updates of variable-length fields (e.g. VARCHAR, NVARCHAR) and NULLable fix-length columns marked as SPARSE that change from NULL to a non-NULL value. If the combination of changes of these types of fields increase the size of the row and there is no more room left on the data page, a page split will occur.
    • FILLFACTOR again helps here as it reserves some amount of space (assuming the row size is small enough to allow for it) for update operations to increase the size of the row without needing to do a page split.
  • Deleting enough rows as to remove one, or more, data pages

Additional resources on index fragmentation:

In Particular

For this particular index, the average row size is 1413 so you will only get 5 rows (again, on average) per page. No FILLFACTOR is specified so it would take the system default value (which the default "default" value is 0, meaning to fill the data page). It seems most likely that the values are not ever-increasing given that FOONO is VARCHAR(12). This would give you some amount of fragmentation quite often as new rows come in. Of course, there will be a slight amount of natural reserved space, even without specifying a FILLFACTOR, since the row size is variable. Meaning, after a REINDEX there might be 2000 bytes left on the page as the next row in order was larger than 2000 bytes and could not fit on the page. If a new row comes in that logically fits between those existing rows and is <= 2000 bytes, it should go there. Hence, for these cases, in order to get a sense of the relationship between the values coming in and the increase in fragmentation, you need to identify the new rows, which seems doable given the timestamp field.

It has been confirmed that this table is INSERT-only. That narrows down the source of fragmentation being the new rows coming in. The variance in fragmentation should be a function of a) the distribution of the values of the FOONO field, and b) the width of the new rows. Given that there is a DATETIME field indicating when the rows are inserted, you should be able to query on a per-week basis for the past 6 weeks:

  1. The FOONO field
  2. The ID field
  3. The total size of the variable-length fields in each row (based on the data there; using COALESCE(DATALENGTH(ColumnName), 0) for each of those fields). You are not concerned about the fixed-length fields since they aren't a factor in the increase in fragmentation (unless they are marked as SPARSE and are now coming in as non-NULL values whereas in prior weeks they were typically NULL).

What you are looking for is (comparing the most recent 3, or now 4, weeks with when fragmentation was lower; and be sure to ORDER BY the DATETIME field) at least one of the following:

  1. a more varied distribution of FOONO values
  2. ID values:
    1. a distribution that is interspersed with existing ID values, AND / OR
    2. non-sequential / out-of-order values with respect to chronological INSERT order as denoted by DATETIME timestamp field (hence why the need to do the ORDER BY TimestampField)
  3. a greater overall row size

Any combination of those things happening should account for an increase in fragmentation.

Where the Rubber Meets the Road

The common theme among these 4 possibilities is that they all have the effect of increasing the rate of page splits. Page splits occur when a new row (or, updated row that is larger than it was before, but for now we are only concerned with new rows) should logically be placed on a data page that does not have enough free space to fit the new row. When this happens, SQL Server will (typically) take some amount of the latter portion of the existing data and move it to a new page and add the new row to whichever of the two pages it logically falls into. The end result is that the new page will have at least half, if not slightly more, of the rows in it (I tested several times with a table that holds 10 rows per page and each page split left the existing page with 5 rows and the new page with 6 rows).

The typical page split is just half of the story, though. The other half is what happens when data is added sequentially to these new pages created by the split operations. As new rows are added and the new pages fill up, new pages are created without moving any rows. When this happens the new page is not just in logical order, but also in physical order as compared to the prior page. This is not considered a fragmented page because from the prior page to this new page, they are in both logical and physical order.

Meaning, if we had full pages A0, B0, and C0 and added some rows that would logically go into B0, a page split will occur giving us a new page B1, with some amount of rows on both B0 and B1. But now, if we fill up B1 and keep adding rows in sequential order that logically come before the data on C0, it will (or will often) create a new page B2 for the new rows without moving anything from B1 onto B2. Physically the pages will be ordered as either:

  • B1, B2, A0, B0, C0

or

  • A0, B0, C0, B1, B2

The jumps from B0 to B1, and from B2 back to C0, are considered fragmentation as the logical order (A0, B0, B1, B2, C0) does not match the physical order. BUT, the jump from B1 to B2 is not considered fragmentation because the logical order does match the physical order.

Let's put all of this together now. Keeping in mind that the data pages for this particular index are packed fairly tightly (which is fine), most insert operations will have a high probability of being into an already filled page and hence requiring a split. If there are 4 new rows to add if they are spread out (either in terms of the value of FOONO and/or in terms of ID--if new values might be between already existing values) such that they should go into 4 separate pages, that is 4 page splits representing 4 new pages and 8 occurrences of fragmentation (the jump out of order and back into order for each of the splits). BUT, if the 4 new rows are in order, then it is possible that only one page split occurs, representing 1 new page and 2 occurrences of fragmentation. If more rows are added sequentially to the new page created by the split such that a new page is needed, they can roll onto a new next page that will obviously increase the page count by 1 but will not register as an occurrence of fragmentation.

The other scenario--data that is sequential in terms of the new records for the week but were not added in sequential order as determined by sorting by the "timestamp" DATETIME field--is mostly the same as the main scenario just described. The difference here is that a) without ordering by the "timestamp" field it might appear as though the data is sequential and will falsely get ruled out as the source of fragmentation, and b) page splits will happen more frequently within the set of created pages that were just created due to the initial page splits.

Hopefully the following example will help illustrate the issue. Assume we are staring with logical pages A0, B0, and C0 and they are also physically in order. Page B0 is filled and contains rows Ba, Bf, and Bn. New rows come in that are Bo, Bp, Bq, Br, and Bt. But they don't come in that order. The following maps out what should be happening to the data pages as each row comes in. The page IDs (e.g. A0) indicate the logical page while the order indicates their physical placement.

  • Start:

    A0
    B0 = Ba, Bf, and Bn (page is full)
    C0

  • Add Br:

    B1 = Bn and Br (physically before A0, but logically after B0 = fragmentation)
    A0
    B0 = Ba and Bf (Split: Bn moved to new page B1)
    C0

  • Add Bt:

    B1 = Bn, Br, and Bt (page filled)
    A0
    B0 = Ba and Bf
    C0

  • Add Bp:

    B1 = Bn and Bp (Split: Br and Bt moved to new page B2)
    B2 = Br and Bt (physically AND logically after B1 = NOT fragmentation)
    A0
    B0 = Ba and Bf
    C0

  • Add Bq:

    B1 = Bn, Bp, and Bq (page filled)
    B2 = Br and Bt
    A0
    B0 = Ba and Bf
    C0

  • Add Bo:

    B1 = Bn and Bo (Split: Bp and Bq moved to new page B2)
    B3 = Br and Bt (still physically after B1, but now logically after B2 = fragmentation)
    B2 = Bp and Bq (physically after B3, but logically after B1 = fragmentation)
    A0
    B0 = Ba and Bf
    C0

Result: 3 page splits, 3 new pages, 4 counts of fragmentation, 4 pages partially filled

Had those same rows been inserted in order, it would end up looking like:

  • Start:

    A0
    B0 = Ba, Bf, and Bn (page is full)
    C0

  • Final:

    B1 = Bn, Bo, and Bp
    B2 = Bq, Br, and Bt
    A0
    B0 = Ba and Bf (Split: Bn moved to new page B1)
    C0

Result: 1 page split, 2 new pages, 2 counts of fragmentation, 1 page partially filled


Misc.

Just to have it noted for those who have a similar setup but do allow UPDATE operations, if it were that UPATEs did occur then this question would contain a false premise and would ultimately be unanswerable. The issue at hand is that a timestamp (whether using a TIMESTAMP / ROWVERSION datatype or a DATETIME / DATETIME2 datatype) only tells you when the last change occurred; it does not tell you a) how many changes have happened, nor b) what were the specific changes that happened (i.e. did the row get smaller? Did 5 fields get much larger but 20 fields got slightly smaller so the row size actually stayed the same? etc). If updates do occur on this table, then the only way to see what caused the increase in fragmentation is to collect the audit data of the changes (whether custom, trigger-based solution or Change Data Capture). But in the current setup, there is no way to reliably correlate the number of rows changed--as indicated by the timestamp field--with any amount of fragmentation (again, IF updates are happening).

Also, regarding this statement:

The table has 134 columns contains 14gb worth of data, and the primary key is not an identity (sigh)

If this were an OLTP table then I would find the 134 columns more problematic than the primary key not being based on an IDENTITY field. Since this is a history table then it is probably ok to have 134 columns. But even being a history table, it is not necessarily best that the PK be based on an IDENTITY field, or that the table even has a PK (I am assuming that PK and Clustered Index are not the same since the index in the question is UNIQUE CLUSTERED and not a PK). And even if we are talking about the Clustered Index (whether or not the PK), then still not necessarily always best as the leading column. It all comes down to how the data is accessed. If the leading field is an IDENTITY but that value is never used in JOINs or UPDATE or DELETE operations, then you will have sacrificed performance on all queries, even when there is no fragmentation (not fixable), just to avoid some amount of fragmentation (fixable via REBUILD or REORGANIZE).