Index Leaf Node Order – How Entries Are Ordered

indexperformance

I am reading the book 'SQL performance explained' and when talking about indeces, it says that databases use doubly linked lists to connect index leaf nodes. Each node is stored in a database block and consists of different index entries.

It then says that the index order is maintained at two levels: one within each leaf node, and also within the different leaf nodes themselves (via the linked list), as you can see in the left side of the picture below.

enter image description here

My question is: I understand the advantages of using a doubly linked list for the blocks so that when inserting new blocks it's easier to maintain the order (it's a matter of moving around some pointers). However, within the block itself, if the order is maintained, how is that performant? Assuming than in a block there are a lot of entries, if one were to insert a new entry in the block, wouldn't that be really unperformant (because there is no data structure such as a doubly linked list).

Best Answer

The performance question you ask is a little more practical than the theory on how this works can answer. (I'm sure there's a Big O notation answer to generalize the performance but I don't know it off the top of my head.)

In practice, most database systems have different configuration options to decide when to split the block of the leaf level nodes. In Microsoft SQL Server for example, this is called Fill Factor. Fill Factor determines how full the leaf level block should be filled with data (ergo how much empty space is left available for future data inserts) on initial creation (or next rebuild of the index) to adjust how often a split operation should occur.

The previously linked article goes into the details, but in short this can be configured anywhere between 1% and 100% (in SQL Server 0% is a faux for 100%) meaning "leave the blocks 99% empty for future inserts" to "fill the block completely", and anywhere in between. This setting will affect performance in various ways, but inevitably a block split event will occur when a block is filled up and this somewhat of a heavier operation.

There is some compensation though, again in Microsoft SQL Server for example, where a small amount of Index Fragmentation occurs during this split events. This allows the split to occur more quickly at the trade-off that the physical ordering of the data may not exactly match the logical ordering of it in the B-Tree. This theoretically can affect performance during the final step of a SELECT query that uses that index, when a leaf node looks up the data page its data is stored in. Though this performance is very negligible in the real world and over time if enough index fragmentation occurs there are ways to fix it such as rebuilding or reorganizing the index.

In summary, yes there is a small amount of a performance hit when blocks need to split but modern database systems have different ways to control how often that occurs such as with the Fill Factor setting in MS SQL Server, and they lessen the performance hit by allowing fragmentation at the negligible tradeoff of querying against the index later on. They also offer ways to compensate for the tradeoffs of these optimizations by providing ways to fix the fragmentation via index rebuilding or reorganizing.