Indexing for an ascending date value

indexperformance

Let's say you have a table of invoices

CREATE TABLE Invoice
(  InvoiceID int IDENTITY(1,1) PRIMARY KEY,
   InvoiceDate DateTime CONSTRAINT [df_invoicedate] DEFAULT GETDATE(),
   ....
)

The invoice table has an InvoiceDate which is always the current date/time when the record is inserted.

Let's say you index this date

CREATE INDEX idx_InvoiceDate ON Invoice(InvoiceDate)

Note this is a non-clustered index (held in a separate file).

Let's say we know the RDBMS uses B+tree format for index files.

Bearing in mind that all inserts will hit the same page of the index file (because they are always at the end of the sort order), and that inserts into a single location are worst-case for B+tree insert, what special considerations should be observed?

Will this index end up lopsided? Should it be reindexed regularly? Should I use a low, normal, or high fill factor?

Should we try to find something else to put into the index prior to the date, e.g. an invoice type, to get the tree to balance out more evenly?

Will there be issues with hotspots or index page contention, because all inserts hits the same part of the index? What can I do to mitigate that?

I understand that for an ascending surrogate key, a reverse key index might be better. But this is a date and I may want to perform scans of a date range. Is there anything I can do that will alleviate any of the issues that reverse key indices are meant to address?

Or is all of this a total non-issue, and indices of this kind are fine and I should stop worrying about it?

Best Answer

Will this index end up lopsided?
No. The way the BTree algorithm works keeps it balanced. (One interpretation of the "B" in BTree is for "balanced".) As leaf pages fill they cause parent pages to split, with half the rows going to each new parent. Further rows cause further splits, which cascade up the tree until the root has to split, at which point a new root page is created. At every stage all paths from root to leaf have the same length and hence "balanced".

Should it be reindexed regularly?
Yes. Each DBMS has its own idiosynchracies but likely regular re-build of indexes will be recommended. If for no other reason, the statistics will get out of date eventually which will result in less efficient query plans.

Should I use a low, normal, or high fill factor?
As all writes will be on the right-most pages leaving free space anywhere else will be a waste. There will never be any writes to use up that free space. Use a fill factor of 100%.

Should we try to find something else to put into the index
Indexes aren't (generally speaking) created for entertainment purposes. They're there to improve query performance or support constraint enforcement. If your workload would benefit from a <something> + date index then create that. If not, don't. Indexes require extra work during writes and consume space and maintenance time. Create the ones you need and no more. And no less. Keep in mind filtered indexes and included columns. Bear in mind that optimiser math may mean your indexes aren't used anyway.

Will there be issues with hotspots or index page contention .. What can I do to mitigate that?
Well, yes, in theory. First test and prove you'll have a problem meeting your expectied transaction-per-second count. If you can't, are you sure it's this index that's the limiting factor? If it is, in-memory tables and snapshot isolation may implement concurrency mechanisms different to the "base" system's, depending on DBMS. There may be alternative storage engines whose characteristics differ. Try those to see if the pain receeds. You can always post a follow-up question here with quantatative observations!

I understand that for an ascending surrogate key..
Surrogate keys are made-up values that replace human-intelligible natural keys. They're used for reasons releated to DBMS implementation peculiarities and are not inherent to the relational model. Your date is not a surrogate key - it is business data.

a reverse key index might be better.
Hmmm .. OK, so instead of writing all today's rows to key value 20161021 you'd write them to .. 12016102? How does that help? Hashing won't help either: to use the hashed values for lookup each date would have to produce the same hash so there's still a hotspot.

Or is all of this a total non-issue, and indices of this kind are fine and I should stop worrying about it?
Probably. The vast majority of indexes on most systems are like this. By-and-large they work just fine and the DBMS is able to process significant workloads on modest hardware. Indexing is one of those things you can (and should) tweak ad nauseam after go-live. Concentrate on delivering a normalised, feature-rich, debugged system. Tune it afterwards.

Related Solutions

Index Performance – Does Index Help When Ordering by Opposite Direction?

If the index is covering it's likely it will be used in conjuction with a sort operation in the query plan to reverse the order.

Edit: Following a little more thought!

It will depend on whether this is a trivial query or involves joins. If trivial:

SELECT x,y FROM MyTable ORDER BY y DESC

and the index order is Y ASC, a reverse scan of the index leaf level should avoid a sort.

If non-trivial:

SELECT x,y FROM MyTable mt INNER JOIN MyOtherTable mot ON mot.y = mt.y

it should depend on the sort order of MyOtherTable.y. If it's ASC as per MyTable.y then the two indexes would be read in index order and a sort applied after the join. If it's desc, in theory a reverse order index scan could be used for the join and an additional sort wouldn't be required to satisfy your order by clause.

Edit2: Couldn't recall if this would show up in the execution plan in SQL Server. The icon doesn't indicate this is a reverse scan, nor does the tooltip on hover. Properties however shows 'Scan Direction - Backward' or checking the plan XML reveals

<IndexScan Ordered="true" --->ScanDirection="BACKWARD"<--- ForcedIndex="false" NoExpandHint="false">
            <DefinedValues>
              <DefinedValue>
                <ColumnReference Database="[TestDb]" Schema="[dbo]" Table="[MyTable]" Column="OtherId" />
              </DefinedValue>
            </DefinedValues>
            <Object Database="[TestDb]" Schema="[dbo]" Table="[MyTable]" Index="[IX_MyTable_OtherId]" />
          </IndexScan>

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Ok, I am making a lot of assumptions (INT instead of VARCHAR(50) being one of them) with this answer, so feel free to correct me if needed. The problem with option B is that it introduces a new join to relate Users to Alerts without any real added benefit. If joining on the UserID, it is best to index the UserID, so you can utilize seeks for your joins.

For Option A, UserID will be the clustering key (index key for the clustered index) on the Users table. UserID will be a nonclustered index key on Alerts table. This will cost 16 bytes per Alert.

For Option B, UserID will be the clustering key on the Users table. UserId will probably be the clustering key in UserMap too, to make joining more efficient. UserKey (assuming this is an INT) would then be a nonclustered index key on the Alerts table. This will cost 4 bytes per Alert. And 20 bytes per UserMap.

Looking at the big picture, one relationship, for Option A, costs 16 bytes of storage, and involves 1 join operation. Whereas, one relationship, for Option B, costs 24 bytes of storage, and involves 2 join operations.

Furthermore, there are a possibility of 340,282,366,920,938,000,000,000,000,000,000,000,000 uniqueidentifiers and only 4,294,967,296 INTs. Implementing a uniqueidentifier to INT map for a this type of relationship could cause unexpected results when you start reusing INTs.

The only reason for creating this type map table, is if you plan on creating a Many to Many relationship between Users and Alerts.

Taking all of this into consideration, I would recommend Option A.

I hope this helps,

Matt

Best Answer

Related Solutions

Index Performance – Does Index Help When Ordering by Opposite Direction?

Sql-server – Indexing – Uniqueidentifier Foreign Key or Intermediary mapping table

Related Question