What would be the maximum number of records that can be indexed with a three levels B-tree ? B+ tree

btreedatabase-sizeindexmaxtree

I am learning dynamic tree-structure organizations and how to design databases.

Consider a DBMS with the following characteristics :

  • file pages with size 2048 bytes
  • pointers of 12 bytes
  • page header of 56 bytes

A secondary index is defined on a page of 8 bytes. What would be the maximum number of records that can be indexed with a three levels B-tree? And with a three levels B+tree ?

Here are two examples of these trees :

B tree

B+ tree

My attempt

B+trees

I have read that

B+ trees are shallower than a B tree. because only the set of the highest key denoted as k in each leaf node except the last one, is stored in the non-leaf nodes, organized as a B-tree. Relational DBMS Internals, chapter 5: Dynamic Tree-Structure Organizations, p.46

Therefore there is a difference, something we store in the nodes in a B tree is stored in the leaves in a B+ tree. Thus, to my mind it was (m-1)h (m being the order and h being the height) as far as each nodes contains at most (m-1) keys to another node. But this is not linked with the number of bytes.

Yet I found in the book mentioned above the following table :

tree minmax height according to the size of pages

Therefore would it be 203.7 number of records ?

B trees

For them, as far as some values are stored in the node, I have to do a division by the number of nodes. And I'm stuck there.

Best Answer

There are many implementation options available to developers of BTree and B+Tree algorithms that will affect the answer here. In a simplistic BTree, all nodes are the same size, and when a node overflows it is split into two half-full nodes with no other key redistribution occurring. Since there will on average be a uniform distribution nodes between half-full and full, the average fill factor will be 75%. You can calculate everything else from that.

Real implementations may however redistribute keys into one or two additional adjacent nodes, which increases the average fill factor. In addition, an implementation may detect (or be notified) that a bulk insertion of pre-sorted keys is happening, and will modify the split algorithm to leave behind a trail of full nodes with only the final node being incomplete; the advantages of this behaviour should be obvious.

In a B+Tree, all key values are present in leaf nodes - so a B+Tree will have as many leaf nodes as the equivalent BTree has nodes overall. The B+Tree will also have internal nodes which contain the keys used as splitters, and the same repetition of values occurs up the tree. Actual implementations however might truncate these keys to fit more in (which changes the fan-out radically, especially at the root level), and of course key redistribution can also be done.

Many implementations use enlarged root nodes, and some allow other nodes also to expand into additional pages, to reduce the hassle of splitting and key redistribution, and to handle very large key values.

Finally many implementations cut short the process of merging nodes on deletion, to the point of only deleting nodes that become empty. There is a number of nasty edge cases with B+Trees regarding merging (consider where you delete a small key from a leaf, where that key was used as a splitter; now you need to replace that splitter with the next value that may be large, and causes the internal node to split!), so it can be easier to just drop it, and any performance impact is not often a concern. So the actual fill factor depends not only on the keys, but also on the history.

The upshot is that the question you're trying to answer is only ever asked for academic interest. It's almost never relevant to real implementations.