Trees (and sets, and hashing) in Java by msz78385


									   Trees (and sets, and hashing)
              in Java
              Brian Palmer

DSA/BP/2006       Trees 2          1
              No duplication
The structures we have looked at so far (stack,
  queue, and list) have all allowed duplication
For example, a stack of opening brackets that only
  allowed one instance of each kind of bracket
  would not have been much use in the stack
On the other hand, we sometimes want a structure
  not to allow duplication: all elements must be

DSA/BP/2006            Trees 2                   2
          Reasons for uniqueness
There are various theoretical reasons, not to
  do with computing, for the usefulness of
  collections of unique elements
In practical terms, it's a matter of searching
In many applications, we want to search for
Searching a list returns, by default, only the
  first matching element. We can't be sure
  that's the element we want.
DSA/BP/2006          Trees 2                 3
In Java, a set is (almost inevitably) an interface
A set is a collection that contains no duplicate
More formally, sets contain no pair of elements e1
  and e2 such that e1.equals(e2), and at most one
  null element.
As implied by its name, this interface models the
  mathematical set abstraction.
(These three paragraphs are copied straight from
  the Java documentation)

DSA/BP/2006            Trees 2                   4
              The Set <E> interface
This contains methods such as add, clear,
   contains, equals, isEmpty, iterator, remove, and
If you attempt to add an element e1 to a set that
   already contains e1, no change will take place
In a set, the entire element provides the
This is fine when we're dealing with simple
   elements, but it can make reference awkward

DSA/BP/2006             Trees 2                       5
Map is yet another Java interface
A map is an object that maps keys to values. A
  map cannot contain duplicate keys; each key
  can map to at most one value.
For example, in a collection of persons, each might
  have a unique id, which would be used as the
  key, with the other relevant data making up the
  value to be stored
Likewise, a collection of policies (for insurance,
  say) would use the policy number as the key,
  and the rest of the data as the value

DSA/BP/2006            Trees 2                    6
         The Map<K,V> interface
The interface has such methods as these:
    containsKey(Object key)
    containsValue(Object value)
    get(Object key)
    put(K key, V value)
    remove(Object key)
It offers no iterator, but it does offer an entrySet(),
   which is a kind of Set and does offer an iterator

DSA/BP/2006                Trees 2                        7
              Speed of access
We are discussing collections where the most
 common activity will be searching; where
 searching will be more common than, say,
 adding and removing
We have looked, in theory, at binary search trees,
 which is O(log n) is both the worst and the
 expected cases
We should take a quick look at hashing, whose
 worst case is dreadful, O(n), but whose
 expected case is very fast, O(1)

DSA/BP/2006            Trees 2                       8
               Hash tables
We can regard a hash table as in effect an array
The parts of the array are known, for some reason,
  as buckets
As in maps, things consist of a key and a value
The key is hashed, and the resulting hashcode is
  used to determine which bucket the value
  should be stored in
To retrieve a value, the key is hashed again and
  the relevant bucket is found
This idea is used so often, that every object in
  Java has a hashCode() method

DSA/BP/2006            Trees 2                   9
        The problem with hashing
If all goes well, we straight to the right bucket
However, different keys might produce hashcodes
   that lead to the same bucket
In that case, there's some work to do, which will be
   discussed in DSA2
In Java, the array is never more than three parts
   full: there is a load factor of 0.75
This gives a workable balance between wasting
   space and wasting time on rehash operations

DSA/BP/2006             Trees 2                    10
              Red-black trees
A red-black tree is kind of binary search tree
In a red-black tree, every element is regarded as
  having an associated colour, either red or black
The root is black
Red rule: If an element is red, none of its children
  can be red: they must be black
Path rule: the number of black elements must be
  the same in all paths from the root to elements
  with no children or with one child

DSA/BP/2006             Trees 2                        11
              How the rules work out
If a red element has any children, it must have two
Almost all non-leaves have two children
Red-black trees are therefore good and bushy
For n elements, the height is log n, even in the
   worst case
This makes it good for searching
Keeping the rules can be fiddly when adding or
   removing an element, but can still be done in
   log n time
DSA/BP/2006            Trees 2                    12
                   A red-black tree

              37                         67

        11          41              61        73


DSA/BP/2006               Trees 2                  13
     Another version of the red-black
1.    Every node is either red or black
2.    The root is black
3.    If a node is red, its children must be black
4.    Every path from a node to a null link must
      contain the same number of black nodes
•     This set of rules is drawn from Data
      Structures... by Weiss, who shows a red-black
      tree after the insertion sequence 10, 85, 15,
      70, 20, 60, 30, 50, 65, 80, 90, 40, 5, 55

DSA/BP/2006              Trees 2                  14
              Yet another version
Root property: the root is black
External property: every external node is black (null nodes
   are black)
Internal property: the children of a red node are black
Depth property: all external nodes must have the same
   black depth, defined as the number of black ancestors
   minus one (a node is an ancestor of itself)
(These from Data Structures etc by Goodrich and
   Tamassia, who use the sequence 4, 7, 12, 15, 3, 5, 14,
   18, 16, 17)

DSA/BP/2006                 Trees 2                           15
 Putting things into a red-black tree
A new node finds its place in ordinary binary
    search tree fashion
We often have then to shift things around in the
    tree to fit the rules
All new nodes are red
If its parent is also red, we have a problem
This is sometimes called a double-red problem
(Removing a node is even more awkward, and we
    won't be looking at it in detail.)
DSA/BP/2006           Trees 2                  16
             When the uncle is black
 • When the sibling of the parent is black (or
   null) we have these possibilities:
        10                    grandparent

uncle          20              parent                       20

                          30 node

             10                                            30

 DSA/BP/2006                                Trees 2                   17
• With the straight line groups, we simply
  rotate and switch the colours of the old
  and new apex (parent and grandparent).
  For example
              20                    20
     10            30       10           30

DSA/BP/2006               Trees 2             18
                        Double rotation
• With the bent-line groups, we need first to
  get them into a straight line. For example,
                   30                        30

       10                               20

              20                 10

DSA/BP/2006                   Trees 2             19
               When the uncle is red
• We do a recolouring: parent and uncle
  switch from red to black, grandparent
  switches from black to red (unless it's the
                   30                       30

              20        40             20        40

  10                          10

DSA/BP/2006                  Trees 2                  20
An excellent animation of red-black trees can be found at
The rules they list are:
1. Every node has a value.
2. The value of any node is greater than the value of its
    left child and less than the value of its right child.
3. Every node is colored either red or black.
4. Every red node that is not a leaf has only black children.
5. Every path from the root to a leaf contains the same
    number of black nodes.
6. The root node is black.

DSA/BP/2006                 Trees 2                        21
This is a Java interface that extends the
   Map interface
It guarantees that elements are in ascending
   key order
The keys must implement the Comparable
Strings and Integers implement the
   Comparable interface and make natural
DSA/BP/2006         Trees 2                22
              Tree collections
The collections framework in Java has two
  explicit tree classes: TreeSet and
In a TreeSet, the element is the basis of
In a TreeMap, the keys are unique (though
  the values stored under those keys might
  very well be the same or similar)
DSA/BP/2006         Trees 2                  23
A TreeMap is an implementation of the
   SortedMap interface
It is based on a red-black tree
Look-up works as in a phone book: you
   have a key, and use it to find a value
TreeMap does not supply an iterator, but it
   does supply what is called a set view of
   the mappings (entrySet())
This is a set, and a set provides an iterator
DSA/BP/2006          Trees 2                    24

To top