I am familiar with the concept of index fragmentation and how to fix it, but I'm not really sure of one specific item.
When you look at index fragmentation, it's always reported in terms of Percentage.
- What is that Percentage a percent of?
fragmentationindexsql server
I am familiar with the concept of index fragmentation and how to fix it, but I'm not really sure of one specific item.
When you look at index fragmentation, it's always reported in terms of Percentage.
This is by no means a full answer but may move things along a bit if you were to try something similar and report your results.
I couldn't reproduce them. With the following test table
CREATE TABLE [dbo].[Table]
(
Col BIGINT
)
CREATE NONCLUSTERED INDEX IX ON [dbo].[Table](Col)
INSERT INTO [dbo].[Table]
SELECT top 12000 ROW_NUMBER() OVER (ORDER BY @@SPID)
FROM master..spt_values v1, master..spt_values v2
And multiple runs of the following script
USE FragTest;
DECLARE @DbccPage TABLE (
ParentObject VARCHAR(255),
Object VARCHAR(255),
Field VARCHAR(255),
VALUE VARCHAR(255))
DECLARE @sp_index_info TABLE (
PageFID TINYINT,
PagePID INT,
IAMFID TINYINT,
IAMPID INT,
ObjectID INT,
IndexID TINYINT,
PartitionNumber TINYINT,
PartitionID BIGINT,
iam_chain_type VARCHAR(30),
PageType TINYINT,
IndexLevel TINYINT,
NextPageFID TINYINT,
NextPagePID INT,
PrevPageFID TINYINT,
PrevPagePID INT,
PRIMARY KEY (PageFID, PagePID));
DECLARE @I INT = 0
WHILE @I < 2
BEGIN
DECLARE @Online VARCHAR(3) = CASE
WHEN @I = 0 THEN 'OFF'
ELSE 'ON'
END
EXEC('ALTER INDEX [IX] ON [dbo].[Table]
REBUILD WITH
(
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
ONLINE = ' + @Online + ',
SORT_IN_TEMPDB = ON
);')
INSERT INTO @sp_index_info
EXEC ('DBCC IND ( FragTest, ''[dbo].[Table]'', 2)' );
; WITH T
AS (SELECT *,
PagePID - ROW_NUMBER() OVER (PARTITION BY PageType, IndexLevel ORDER BY PagePID) AS Grp
FROM @sp_index_info)
SELECT PageType,
MIN(PagePID) AS StartPID,
MAX(PagePID) AS EndPID,
COUNT(*) AS [count],
IndexLevel
FROM T
GROUP BY Grp,
PageType,
IndexLevel
ORDER BY PageType DESC,
StartPID
DECLARE @DynSQL NVARCHAR(4000)
SELECT @DynSQL = N'DBCC PAGE (FragTest, ' + LTRIM(PageFID) + ',' + LTRIM(PagePID) + ',3) WITH TABLERESULTS'
FROM @sp_index_info
WHERE PageType = 10
INSERT INTO @DbccPage
EXEC(@DynSQL)
SELECT VALUE AS SinglePageAllocations
FROM @DbccPage
WHERE VALUE <> '(0:0)'
AND Object LIKE '%IAM: Single Page Allocations%'
SELECT avg_page_space_used_in_percent,
avg_fragmentation_in_percent,
fragment_count,
page_count,
@Online AS [Online],
(SELECT COUNT(*)
FROM @DbccPage
WHERE VALUE <> '(0:0)'
AND Object LIKE '%IAM: Single Page Allocations%') AS SinglePageAllocations
FROM sys.dm_db_index_physical_stats(db_id(), object_id('[dbo].[Table]'), 2, NULL, 'DETAILED')
WHERE index_level = 0
DELETE FROM @sp_index_info
DELETE FROM @DbccPage
SET @I = @I + 1
END
I consistently got results like
PageType StartPID EndPID count IndexLevel
-------- ----------- ----------- ----------- ----------
10 119 119 1 NULL
2 2328 2351 24 0
2 2352 2352 1 1
2 2384 2392 9 0
SinglePageAllocations
----------------------
(0 row(s) affected)
avg_page_space_used_in_percent avg_fragmentation_in_percent fragment_count page_count Online SinglePageAllocations
------------------------------ ---------------------------- -------------------- -------------------- ------ ---------------------
98.8139362490734 0 2 33 OFF 0
PageType StartPID EndPID count IndexLevel
-------- ----------- ----------- ----------- ----------
10 115 115 1 NULL
2 114 114 1 0
2 118 118 1 1
2 2416 2449 34 0
SinglePageAllocations
-----------------------
(1:114)
(1:118)
avg_page_space_used_in_percent avg_fragmentation_in_percent fragment_count page_count Online SinglePageAllocations
------------------------------ ---------------------------- -------------------- -------------------- ------ ---------------------
97.4019644180875 2.85714285714286 2 35 ON 2
At least in the test I did the differences between the two balanced out fragmentation wise (though similarly to your test I did find that rebuilding the index online led to a higher page count.).
I found that the Online = OFF
version always used uniform extents and had zero single page allocations whereas the Online = ON
always seemed to put the index root page and first index leaf page in mixed extents.
Putting the first index leaf page in a mixed extent and the rest in contiguous uniform extents causes a fragment count of 2.
The Online = OFF
version avoids the fragment caused by the lone index leaf page but the contiguity of the leaf pages is broken by the index root page that shares the same extents and this too has a fragment count of 2.
I was running my test on a newly created database with 1 GB of free space and no concurrent activity. Perhaps the Online = OFF
version is more vulnerable to concurrent allocations causing it to be given non contiguous uniform extents.
FILLFACTOR
only applies when you build or rebuild an index, not during normal operation. Normal operations always try to fill the pages to 100%.
If you insert a row that has a variable width, then update the row to be longer, that row will no longer fit on the page if there isn't enough extra space to store the after-image on the same page. If there isn't enough space, this will cause a page split, which is the process that creates the necessary space.
What's a bit misleading about page splits is that there are "good splits" and "bad splits," even though the performance counter counts all of them.
A good split is when new rows are added to the end of the index, like what happens when you do your initial batch of INSERT
s. The new row doesn't fit on the last page, so the storage engine must allocate a new page and logically hook it up to the last index page. The new page will probably exist physically after the old-last index page.
A bad split is when a page must be inserted in the middle of an index: the new page is attached to the index structure logically on both sides, but may (likely) not exist in contiguous physical order to those pages.
Fragmentation is the discrepancy between the logical and physical order, and for the most part, only the bad type of page splits cause fragmentation.
Because you're inflating the size of already-existing rows, this causes the bad type of page split, which is why you're seeing high fragmentation numbers.
It's unclear what your exact process is, and how many rows are ultimately going to end up in this table. If it's something like a one-time population, do the full population process, then turn around and rebuild the clustered index with 100% FILLFACTOR
.
If this process happens continuously, you could "pre-allocate" the space by doing the following: add a dummy variable-width column to the table, populate it to the max length on INSERT
, and then on the first UPDATE
, set the dummy value to NULL
while updating the real value. This method will probably add overhead to the logging mechanism because of all the data values flying around.
In general, though, you only need to concern yourself with fragmentation when all of these factors are true:
I would recommend, however, putting in place a solid index maintenance solution to keep things more or less in check. You can search this site for recommendations with regards to that.
Best Answer
It's going to be logical fragmentation for indexes, and extent fragmentation for heaps. The BOL reference on
sys.dm_db_index_physical_stats
actually gives pretty good information on the topic:Think about an index, as highlighted above it's the "percentage of out-of-order pages in the leaf pages". An index page is going to be a doubly-linked list. In other words, each page points to the next page and the previous page in it's respective index level. Here's a picture on what a doubly linked list looks like (for illustrative purposes):
Note: the above image was just randomly pulled from the internet, it's a simple representation. But it does lack
A
's next pointer, andF
's previous pointer. I didn't want any confusion that it's exactly how a leaf page in a SQL Server index would look like.This structure is going to represent the logical flow of the data. In other words, what page comes next regardless of what the physical reality is. That logical fragmentation comes into play when the actual physical next page isn't what the next page is pointed to by the current leaf page.
Given the above definition, say you have four index pages. 1 out of 4 of those index pages next page pointer is not the physical page next to it, but the other 3 do have that physical nature. In that case, your percent of fragmentation would be 25%. Of course, that was a simple example and in reality you wouldn't care about an index less than thousands of pages. But hopefully that makes it a little clearer exactly what that number represents.
Here's a few good reads on Clustered Index Structures, Nonclustered Index Structures, and Heap Structures.