B Tree and B+Tree differences

btreeindextree

I'm studying B+ Tree and B Tree and I would like to understand two things about it, if someone can clarify it to me I would appreciate:

  1. Why can I store more search keys on an B+ Tree Index? My guess would be that the reason is because the nodes of an B+ Tree point out to sub-trees instead of data.

  2. Is there any type of comparison of data that will not work with an B+ Tree index or can I use all of them (=, >=, !=, <, <> …) ?

Best Answer

In the original description of a B-tree the internal nodes held not just the index keys but all columns of the table. In the B+-tree only the leaf nodes store all columns' values. As each node in the tree is a fixed size, removing non-key data from internal nodes frees space for more keys to be held.

All operators can be made to work with the BTree/ B+Tree structure. Some are more efficient than others so, in practice, the query optimizer may choose not to use the index if one of the inefficient operators is involved.

The best case is equality (=). Here the matching key intervals can be followed from root to leaf and the corresponding row(s) returned. It may even be possible to determine from the root node alone that the query will return an empty set so tree walking can stop at that point.

Range queries (<=, >=) can be efficient. The tree can be walked to a leaf for the given predicate value. If leaf pages are a doubly-linked list the remainder of the range can be found by following these leaf-to-leaf links.

Wildcard searches can use the tree, depending on where the wildcard character is. If it is in the middle of the predicate or at the end then the search can proceed as for a range query, with additional checking of each value read. If the predicate starts with the wildcard the tree cannot be used because a B+Tree is an ordered set and a leading wildcard gives no indication of where in that ordering to start.

An inequality operator (!=) can be treated as two range conditions: (value < predicate) or (value > predicate). If the distribution of values in the data is very skewed it may still be efficient to use the tree. For typical cases it is more likely the optimizer will scan the leaf nodes from start to end.