# BT ree Insertion Algorithm

Document Sample

```					                         Preliminaries
• Multiway trees have nodes with greater than two children.
Multiway trees of order k have nodes with most k children
• 2-3-4 Trees
– For all non leaf nodes, Nodes with
• One data items have two pointers
• Two data items have three pointers
• Three data items have four pointers
– Children of pointer p have keys less than data item p.
– Children of the last pointer contains keys > than the last data item.
• B-Trees (Balanced, Boeing, broad, bushy, or Bayer (for Rudolph Bayer)??)
– Each node contains links to as many children as can fit in a disk block.
Node Structures
• 2-3-4 tree           • B-Tree

class Node             class Node
{                      {
Item[3] items          Item[k] items;
Node[4] nodes;         Node[k+1] nodes;
}                      }
2-3-4 Insertion Algorithm
• Insert( node )
If node is full Then Call splitNode
If key is found in node, then Return “DuplicatesNotAllowed”
If this is a leaf node, Insert the Data item and Return
Call Insert(appropriateChildPointer)
• SplitNode
Allocate a newNode and add the right child to it
If parent exists Then
Insert middleChild to parent node and point to newNode
Else
Allocate new Root containing middleChild of node
root’s firstChildPointer points to newNode
root’s secondChildPointer points to node
2-3-4 Deletion Algorithm
• Find the node to delete. If it is not a leaf node, replace its
data by its successor, and then remove the successor.
• Cases to consider after deleting a key from a 2-3-4 node:
1.   If keys remain in the node, break.
2.   If there is a sibling with more than entry, then promote the
sibling entry to the parent node, and demote the parent entry to
be in this node.
3.   If sibling nodes have only one entry, remove the current node
from the tree. Next merge the parent into the sibling node,
possibly creating a hole at the parent. Recursively, work up the
tree applying steps 1, 2, and 3 as needed. If the root node
becomes empty, simply remove it from the tree.
Visual Illustration of the 2-3-4-Delete

Case 1:         11, 22, 33                     11, 33

Case 2:      11, 22, 33                         09, 22, 33

08, 09                   12          08                  11

Case 3:         11

08                   12           08,11

The algorithm recursively works its way up the tree
Characteristics of External Storage
• Speed is at least three orders of magnitude slower
than memory.
• The extra overhead of searching through multiway
tree nodes is more than compensated because less
tree depth means less disk access.
• It is desirable to design the record sizes with disk
block sizes in mind. Each disk read/write will be
in multiples of its block size.
B-Tree Insertion Algorithm
• Differences from the 2-3-4 algorithm
– Node splitting is from the bottom up rather than the top
down.
• Advantage: The tree is kept more full.
• Disadvantage: A tree down could be followed by a tree up if
multiple splits are necessary.
– Half of the items go to the new node, half remain in the
old node.
– The middle key is promoted to the next level up.
– Contraction occurs when a node and a sibling have less
than a full block of data items.

Note: Standard B-tree implementations require at least half full nodes.
External Storage Optimizations
• It is more efficient to keep the index and data separate
– Separate indices allow for multi-keyed files
• Refinements exist to guarantee that no record is less than 2/3
full. Nodes are balanced over three siblings.
• Some implementations only have data pointers at the last level.
• A linked list of free disk blocks is often used to reclaim storage
space after deletions.
• Efficiency: Assume a block contains 8096 bytes, each key is 24
bytes, the blocks are half full, and the pointers require 4 bytes.
How many levels deep is the tree?
Other External Storage Algorithms
• Create binary tree in memory for the index
• Sorting external data with a type of merge sort
– On Each pass
•   Read large block from each piece of the file
•   Perform merge
•   Write back to second file
•   Keep reading blocks from each half until they run out.
– There will be logk N merges where k is the number
of data elements that can fit in the memory blocks.

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 5 posted: 2/19/2012 language: pages: 9