Index Leaf Node Order – How Entries Are Ordered

indexperformance

I am reading the book 'SQL performance explained' and when talking about indeces, it says that databases use doubly linked lists to connect index leaf nodes. Each node is stored in a database block and consists of different index entries.

It then says that the index order is maintained at two levels: one within each leaf node, and also within the different leaf nodes themselves (via the linked list), as you can see in the left side of the picture below.

My question is: I understand the advantages of using a doubly linked list for the blocks so that when inserting new blocks it's easier to maintain the order (it's a matter of moving around some pointers). However, within the block itself, if the order is maintained, how is that performant? Assuming than in a block there are a lot of entries, if one were to insert a new entry in the block, wouldn't that be really unperformant (because there is no data structure such as a doubly linked list).

Best Answer

The performance question you ask is a little more practical than the theory on how this works can answer. (I'm sure there's a Big O notation answer to generalize the performance but I don't know it off the top of my head.)

In practice, most database systems have different configuration options to decide when to split the block of the leaf level nodes. In Microsoft SQL Server for example, this is called Fill Factor. Fill Factor determines how full the leaf level block should be filled with data (ergo how much empty space is left available for future data inserts) on initial creation (or next rebuild of the index) to adjust how often a split operation should occur.

The previously linked article goes into the details, but in short this can be configured anywhere between 1% and 100% (in SQL Server 0% is a faux for 100%) meaning "leave the blocks 99% empty for future inserts" to "fill the block completely", and anywhere in between. This setting will affect performance in various ways, but inevitably a block split event will occur when a block is filled up and this somewhat of a heavier operation.

There is some compensation though, again in Microsoft SQL Server for example, where a small amount of Index Fragmentation occurs during this split events. This allows the split to occur more quickly at the trade-off that the physical ordering of the data may not exactly match the logical ordering of it in the B-Tree. This theoretically can affect performance during the final step of a SELECT query that uses that index, when a leaf node looks up the data page its data is stored in. Though this performance is very negligible in the real world and over time if enough index fragmentation occurs there are ways to fix it such as rebuilding or reorganizing the index.

In summary, yes there is a small amount of a performance hit when blocks need to split but modern database systems have different ways to control how often that occurs such as with the Fill Factor setting in MS SQL Server, and they lessen the performance hit by allowing fragmentation at the negligible tradeoff of querying against the index later on. They also offer ways to compensate for the tradeoffs of these optimizations by providing ways to fix the fragmentation via index rebuilding or reorganizing.

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

You can store your index as a list of fixed-size offsets into the block containing your key data. For example:

+--------------+
| 3            | number of entries
+--------------+
| 16           | offset of first key data
+--------------+
| 24           | offset of second key data
+--------------+
| 39           | offset of third key data
+--------------+
| key one |
+----------------+
| key number two |
+-----------------------+
| this is the third key |
+-----------------------+

(well, the key data would be sorted in a real example, but you get the idea).

Note that this does not necessarily reflect how index blocks are actually constructed in any database. This is merely an example of how you might organise a block of index data where the key data is of variable length.

Sql-server – Aren’t two writes required to update a clustered index record

I think the problem here is a difference in terminology.

The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.

The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.

The one thing I would pick at from the article quote is this (emphasis mine):

One write for inserting the row, and one write for updating the non-clustered index.

This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.

So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?

Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."

Best Answer

Related Solutions

Mongodb – How do databases store index key values (on-disk) for variable length fields

Sql-server – Aren’t two writes required to update a clustered index record

Related Question