Sql-server – index maintenance strategy if few insertion points relative to # of rows

index-tuningsql serversql-server-2008-r2

In SQL Server 2008 R2, I have a nonclustered covering index on multiple tables with 100M+ rows. The table has a few thousand "insertion points" where all new inserts happen. This means that regardless of fill factor, I'll quickly end up with page splits and fragmentation at every insertion point, and no fragmentation or splits anywhere else in the table. Unfortunately, queries always include new rows and hence fragmented areas of the index.

  • what happens when there's a page split but inserts continue sequentially after the split? Is there a way to tell SQL Server to do the split with lots of extra room for subsequent inserts, without wasting space on existing pages with a large fill factor that for most pages will never be filled?
  • what are good index maintenance strategies to use for indexes like this?
  • is there a good automated way to identify tables like this where fragmentation is severe but not uniform? These tables don't show up as more than 5% fragmented overall.
  • are there index schema changes I should be considering?

Here's more info about the problem. The indexes all look like this pattern (simplifying for clarity below) :

CREATE TABLE Foo (
    id int identity(1,1) PRIMARY KEY CLUSTERED, 
    foreign_key int, 
    log_time datetime, 
    ...)     
CREATE NONCLUSTERED INDEX on Foo (foreign_key, log_time) INCLUDE (...)

Queries on this table are always in this form:

WHERE log_time > getdate()-70 AND foreign_key IN (select ...)

Other facts:

  • there are about 5,000 foreign_key values, each with 10,000's of rows for each.
  • average row size is 55 bytes, meaning around 150 rows per page
  • the IN filter usually includes 10%-50% of foreign_key values rows and the date filter includes 20%-40% of the rows. The average is about 15% of total rows selected.
  • the index is a covering index for the queries, so no clustered index access is needed.

Best Answer

The answer by ThomasStringer is very good and the usage of partitioning here would likely help overall performance and decrease your maintenance cost - however, it won't do anything for your page split scenario.

Can you verify that your FK's are trusted? This won't decrease or impact your page splits but it will help your query execution time.

select 'ALTER TABLE ' + s.name + '.' + OBJECT_NAME(o.object_id) + ' WITH CHECK CHECK     CONSTRAINT ' + i.name + '
GO'
FROM sys.foreign_keys i  INNER JOIN sys.objects o ON i.parent_object_id = o.object_id     INNER JOIN sys.schemas s ON o.schema_id = s.schema_id WHERE     i.is_not_trusted = 1 AND i.is_not_for_replication = 0
-- Foreign Key Check... Will build the statement below to CHECK (validate) the FK for     those FK's that are untrusted.
UNION
select 'ALTER TABLE ' +  s.name + '.' + OBJECT_NAME(o.object_id) + ' WITH CHECK CHECK     CONSTRAINT ' + i.name + '
GO'
from sys.check_constraints i join sys.objects o on i.parent_object_id = o.object_id     join sys.schemas s on o.schema_id = s.schema_id where i.is_not_trusted =     1 and i.is_not_for_replication = 0

-- Check Constraint Check... Will build the statement below to CHECK (validate) the       check constraints for those check constraints that are untrusted.

-- !!!! -- The output (in TEXT) will look similar to what you see below.  once you     have the output, put it into the query window and execute it.

Another idea you already hit on, using fill factor - if you use a low fill factor and combine that with either row or page level compression, your data density, per page will increase. Again, this won't help your page split problem, BUT it can decrease the quantity of page splits (due to having a higher data density per page).

Lastly, if you can, look at changing your index and lead with the log_time column instead of the foreign key column. This change, depending on how the log_time data comes in (hopefully it's more "in order" than the FK column). This could decrease your page splits significantly. If you pair this with both compression (ROW) and a good partition scheme, you might see some significant improvements.