What would be the maximum number of records that can be indexed with a three levels B-tree ? B+ tree

btreedatabase-sizeindexmaxtree

I am learning dynamic tree-structure organizations and how to design databases.

Consider a DBMS with the following characteristics :

file pages with size 2048 bytes
pointers of 12 bytes
page header of 56 bytes

A secondary index is defined on a page of 8 bytes. What would be the maximum number of records that can be indexed with a three levels B-tree? And with a three levels B+tree ?

Here are two examples of these trees :

My attempt

B+trees

I have read that

B+ trees are shallower than a B tree. because only the set of the highest key denoted as k in each leaf node except the last one, is stored in the non-leaf nodes, organized as a B-tree. Relational DBMS Internals, chapter 5: Dynamic Tree-Structure Organizations, p.46

Therefore there is a difference, something we store in the nodes in a B tree is stored in the leaves in a B+ tree. Thus, to my mind it was (m-1)^h (m being the order and h being the height) as far as each nodes contains at most (m-1) keys to another node. But this is not linked with the number of bytes.

Yet I found in the book mentioned above the following table :

Therefore would it be 20^3.7 number of records ?

B trees

For them, as far as some values are stored in the node, I have to do a division by the number of nodes. And I'm stuck there.

Best Answer

There are many implementation options available to developers of BTree and B+Tree algorithms that will affect the answer here. In a simplistic BTree, all nodes are the same size, and when a node overflows it is split into two half-full nodes with no other key redistribution occurring. Since there will on average be a uniform distribution nodes between half-full and full, the average fill factor will be 75%. You can calculate everything else from that.

Real implementations may however redistribute keys into one or two additional adjacent nodes, which increases the average fill factor. In addition, an implementation may detect (or be notified) that a bulk insertion of pre-sorted keys is happening, and will modify the split algorithm to leave behind a trail of full nodes with only the final node being incomplete; the advantages of this behaviour should be obvious.

In a B+Tree, all key values are present in leaf nodes - so a B+Tree will have as many leaf nodes as the equivalent BTree has nodes overall. The B+Tree will also have internal nodes which contain the keys used as splitters, and the same repetition of values occurs up the tree. Actual implementations however might truncate these keys to fit more in (which changes the fan-out radically, especially at the root level), and of course key redistribution can also be done.

Many implementations use enlarged root nodes, and some allow other nodes also to expand into additional pages, to reduce the hassle of splitting and key redistribution, and to handle very large key values.

Finally many implementations cut short the process of merging nodes on deletion, to the point of only deleting nodes that become empty. There is a number of nasty edge cases with B+Trees regarding merging (consider where you delete a small key from a leaf, where that key was used as a splitter; now you need to replace that splitter with the next value that may be large, and causes the internal node to split!), so it can be easier to just drop it, and any performance impact is not often a concern. So the actual fill factor depends not only on the keys, but also on the history.

The upshot is that the question you're trying to answer is only ever asked for academic interest. It's almost never relevant to real implementations.

Related Solutions

Sql-server – B-tree node split strategy in SQL Server for monotonically increasing value

If it is adding a row at the end of the index it will just allocate a new page for the row rather than split the current end page. Experimental evidence for this is below (uses the %%physloc%% function which requires SQL Server 2008). See also the discussion here.

CREATE TABLE T
(
id int identity(1,1) PRIMARY KEY,
filler char(1000)
)
GO

INSERT INTO T
DEFAULT VALUES
GO 7

GO
SELECT sys.fn_PhysLocFormatter(%%physloc%%)
FROM T

GO

INSERT INTO T
DEFAULT VALUES

GO

SELECT sys.fn_PhysLocFormatter(%%physloc%%)
FROM T
GO

DROP TABLE T

Returns (Your results will vary)

(1:173:0) /*File:Page:Slot*/
(1:173:1)
(1:173:2)
(1:173:3)
(1:173:4)
(1:173:5)
(1:173:6)
(1:110:0) /*Final insert is on a new page*/

This does only appear to apply to leaf nodes though. This can be seen by running the below and adjusting the TOP value. For me 622/623 was the cut off point between requiring one and two first level pages (might vary if you have snapshot isolation enabled?). It does split the page in a balanced manner leading to wasted space at this level.

USE tempdb;

CREATE TABLE T2
(
id int identity(1,1) PRIMARY KEY CLUSTERED,
filler char(8000)
)

INSERT INTO T2(filler)
SELECT TOP 622 'A'
FROM master..spt_values v1,  master..spt_values v2

DECLARE @index_info  TABLE
(PageFID  VARCHAR(10), 
  PagePID VARCHAR(10),   
  IAMFID   tinyint, 
  IAMPID  int, 
  ObjectID  int,
  IndexID  tinyint,
  PartitionNumber tinyint,
  PartitionID bigint,
  iam_chain_type  varchar(30),    
  PageType  tinyint, 
  IndexLevel  tinyint,
  NextPageFID  tinyint,
  NextPagePID  int,
  PrevPageFID  tinyint,
  PrevPagePID int, 
  Primary Key (PageFID, PagePID));

INSERT INTO @index_info 
    EXEC ('DBCC IND ( tempdb, T2, -1)'  ); 

DECLARE @DynSQL nvarchar(max) = 'DBCC TRACEON (3604);'
SELECT @DynSQL = @DynSQL + '
DBCC PAGE(tempdb, ' + PageFID + ', ' + PagePID + ', 3); '
FROM @index_info     
WHERE IndexLevel = 1

SET @DynSQL = @DynSQL + '
DBCC TRACEOFF(3604); '

EXEC(@DynSQL)


DROP TABLE T2

Sql-server – Aren’t two writes required to update a clustered index record

I think the problem here is a difference in terminology.

The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.

The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.

The one thing I would pick at from the article quote is this (emphasis mine):

One write for inserting the row, and one write for updating the non-clustered index.

This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.

So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?

Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."