# Disjoint Set or Union-Find ADT

Document Sample

```					             Disjoint Set or Union-Find ADT
[ Section 4.2 ]
Objects: A collection of nonempty disjoint sets S = S1 , S2 , ..., Sk ,
i.e., each Si is a nonempty set that has no element in common
with any other Sj . In mathematical notation this is:

Si ∩ Sj = , ∀i = j

Each set is identiﬁed by a unique element called its representa-
tive.
Operations:

• MAKE-SET(x): Given an element x that does not already
belong to one of the sets, create a new set {x} that contains only x (a

• FIND-SET(x): Given an element x, return the representative of the

• UNION(x,y):
– Given two distinct elements x and y
– let Sx be the set that contains x and Sy be the set that
contains y.
– UNION(x,y) is an operation that results in a new set
consisting of Sx ∪ Sy .

79
Two things need to be updated to maintain a valid collection of
sets:

1. Remove Sx and Sy from the collection (since all the sets must be dis

2. Picks a representative for the new set.

Note: If both x and y belong to the same set already (i.e., Sx =
Sy ), nothing is done by this operation.

Applications

• Maintaining the set of connected components of a graph.

• Maintain lists of duplicate copies of webpages.

• Constructing a minimum spanning tree for a graph (Kruskal).

Finding connected components:

For all v in V do
MAKE-SET(v)
For all (u,v) in E do
UNION(u,v)

Q: How can we test whether u and v are connected?

FIND-SET(u) == FIND-SET(v)?

80
Kruskal’s Algorithm for MSTs
Intuition: Grow an MST A by repeatedly adding the “lightest”
edge from E that does not create a cycle.

Example
9
s                           5                  9
5
8
8
2                   4       6
2                     4           6
10
10

Order of edges in priority queue: 2,4,5,6,8,9,10
9                                  9
5                                    5
8                                    8
2                   4       6      2                    4            6
10                                  10

KRUSKAL-MST(G=(V,E),w:E->Z)
A := {};
insert the edges into a priority queue Q;
for each vertex v in V
MAKE-SET(v);
while (Q not empty)
e = EXTRACT-MIN(Q); \* e = (u,v) *\
if FIND-SET(u) =/= FIND-SET(v) then
UNION(u,v);
A := A U {e};
end if
end while
END KRUSKAL-MST
81
Q: Which line checks that adding edge e to A does not create a
cycle?

FIND-SET(u) = FIND-SET(v).

Data Structures for Disjoint Sets

1. Arrays

3. Trees

Arrays

• One position for each element.

• Each position stores the element and an index to the set
representative. For example, the collection of sets
{{A}, {B, E}, {C, F, G}, {D}}

could be represented using the following array:

1   2    3   4    5   6   7
A   B    C   D    E   F   G
1   2    6   4    2   6   6

82
Operations

• MAKE-SET(x): store x in the next available location, takes
time: O(1):

• FIND-SET(x): the index value stored with x indicates the
representative of the set containing x, so FIND-SET(x)
takes time O(1).

• UNION(x,y): Let X be the index value stored with x and
Y be the index value stored with y. If X = Y , then...
go through every element in the array and replace every index value
Example: UNION(A,E) results in:

1   2    3   4    5   6   7
A   B    C   D    E   F   G

Worst-case sequence complexity for m operations:
Upper bound:

• In terms of n, the number of elements, each operation takes
time at most O(n)

• How big can n get? O(m)

• Therefore, any sequence of m operations takes time O(m2 ).
Note that this is an over estimate.
83
Lower bound:

• What is the worst possible sequence of m operations? Do
m/2        MAKE-SETs followed by
m/2 − 1 UNIONs

– each UNION will take time Ω(m/2) (the number of elements)

– so the sequence will take time Ω((m/2) ∗ (m/2 − 1)) = Ω(m2

Therefore the worst-case sequence complexity is Θ(m2 ).

Represent each set by a circularly-linked list. Let listx represent
the list containing x and assume the head of the list is the rep-
resentative element.

• MAKE-SET(x): make a new set by creating a new linked list with ele
Complexity:     O(1)

• FIND-SET(x): in the worst-case, we need to traverse every link in
Complexity: Ω(length of list)

• UNION(x,y): Append listy to the end of listx , update the pointer fro
Complexity: O(1)

84
Worst-case sequence complexity for m operations:

Upper bound

The number of elements in the structure at any point in any se-
quence of m operations is ≤ m. The complexity of each opera-
tion in a sequence is O(m), so the total time is O(m2 ).

Lower Bound:

What is the worst possible sequence of m operations?

• Perform m/4 MAKE-SETs with different elements.

• Then do m/4 − 1 UNIONs constructing one list with m/4 elements,

• Followed by m/2 FIND-SETs on the second element in
the list, so that each one requires time Ω(m/4) .

• Total time is Ω(m2 ).

Problem?

Q. How can we improve this?

85

Represent each set by a linked list, with

• the ﬁrst element in the list being its representative

• each element in the list has a pointer back to the head and
a pointer to the next element.

• Let listx represent the list containing x.

• You may assume the head of the list also has a pointer to
the tail for the UNION operation.

Operations

• MAKE-SET(x): create a new linked list with element x
Complexity:     O(1)

Complexity: O(1)

• UNION(x,y): Append listy to the end of listx , update the pointers .
Complexity: Θ(length of listy )

Why?
Since we can ﬁnd the head of listy and the tail of listx in constant tim

86
Worst-case sequence complexity for m operations:

Upper Bound
Same as before.
Lower Bound

• Perform m/2 + 1 MAKE-SETs with different elements.

• Then do m/2 − 1 UNIONs with a set of size one and the growing lis

m/2
• Total time is Ω(m2 ) =       i=1 i.

Problem?
still very inefﬁcient if we have long unions.

Q: How can we ﬁx this?
Linked list with extra pointer to front and “union-by-weight”
⇒ Now we keep track of the number of elements in each list.
Q: Are MAKE-SET and FIND-SET affected?
no.
We always append the smaller set to the longer one (so we have fewer po

This is called “union-by-weight”. The “weight” of a set is simply
its size.
87
Worst-case sequence complexity for m operations:

Upper Bound

• Let n be the number of MAKE-SET operations in the se-
quence (so there are never more than n elements in total).

• For some arbitrary element x, how many times can x’s back
pointer can be updated? Consider when this happens:
This happens only when listx is UNIONed with a set that is no small

• So each time x’s back pointer is updated, the resulting set
must have size at least twice |listx |.

• So the limit on the number of times that x s back pointer is
updated is log(n) times.

• This is true for every element x. Therefore, the total number
of pointer updates during the entire sequence of operations
is O(n log n).

• Since the time for other operations is still O(1), and there
are m operations in total, the total time for the entire se-
quence is O(m + n log n).

88
Trees
⇒ Represent each set by a tree, where each element points to
its parent and the root points back to itself.
The representative of a set is the root.
Note that the trees are not necessarily binary trees:

• MAKE-SET(x) just create a new tree with root x.
Complexity: O(1)

• FIND-SET(x):: simply follow ”parent” pointers back to the root of x
Complexity: O(depth of x)

• UNION(x,y): just make the root of one of the trees point to the roo

root_x                   root_y

y
x

root_x
UNION (x,y)
root_y

x

y

89
Complexity: Θ(max{height(treex ), height(treey )})          Worst-
case sequence complexity for m operations:

Lower Bound

• Just like for the linked list with back pointers but no size

• I.e., we can create a tree that is just one long chain with m/4 element

How can we create this tree? using a combination of MAKE-SET
and UNION operations.

for i = 1 to m/4 do         MAKE-SET(xi )for i = 1 to m/4 - 1 do      UNI

90
X(m/4-1)
Xm/4                                        m/4 - 1

.
.
.

X1

UNION(Xm/4, X(m/4-1))

Xm/4

X(m/4-1)
m/4 - 1

.
.
.

X1

Creating this tree takes m/4                           MAKE-SET operations and m/4 − 1
UNION operations.

• Now FIND-SET takes time Ω(m) .

• If we perform m/2 FIND-SET operations, we get a se-
quence whose total time is Ω(m2 ).

Q: How do we know there is not a sequence of operations that
takes longer than Θ(m2 )? same argument as for linked lists

91
Q: How can we improve the trees data structure representation
of disjoint sets?

⇒ keep track of the weight (i.e., size) of each tree and always ap-
pend the smaller tree to the larger one when performing UNION.

• The complexity of MAKE-SET and UNION are still O(1)
and O(max(height(treex ), height(treey ))).

• When one tree is appended to another, what is the weight
of the new tree?
the sum of the two weights

• What is the complexity of FIND-SET?
Suppose during a sequence of m operations, there are n
MAKE-SET operations, then . . . the maximum height of any tree is O

Q: How would we prove this?
(The proof is by induction on the height h of the trees.)
Q: What does this tell us about the running time of any
individual FIND-SET operation?
O(log n)

So the total time for the entire sequence is O(m log n).

92
Q: Can we do better?

When performing FIND-SET(x),

• keep track of the nodes visited on the path from x to the
root of the tree by using a stack or queue

• once the root is found, update the parent pointers of each node to poi

Q: How does this affect the complexity of the FIND-SET opera-
tion? doubles it the ﬁrst time, makes it constant the rest of the time

Q: What is the complexity of a UNION(x,y) operation?

depends on whether FIND-SET has already been called on one/both of x

Q: Does the improvement in complexity of UNION and subse-
quent FIND-SET operations out-weigh the increase in cost of
the initial FIND-SET?

Q: How might we answer this?

do amortized analysis–we’ll see this topic next.

93
Consider a sequence of operations including

• n MAKE-SET ops,

• at most n − 1 UNIONs and

• f FIND-SET ops, the worst-case running time of a single
operation in the sequence is:

f log n
Θ(                 ) if f ≥ n
log(1 + f /n)
Θ(n + f log n)       if f < n

Q: Can we do better?

Q: What measure of trees matters the most?

With trees, the measure that matters the most for the running time is the

Therefore it makes more sense to relate heuristics to the height
of a tree rather than the overall weight in the UNION operation.

• Deﬁne the rank of a tree to be an upper bound on the height
of the tree.

• Note that the rank may not be equal to the height of the tree.

• We’ll store the rank of a tree at it’s root.

94
Operations

• MAKE-SET(x): Same as before, add rank(x)=0.

• UNION(x,y): We know rank(treey ) and rank(treex ).
Which root of treex and treey becomes the new root?
the node with higher rank is the new root
What is the rank of the new tree?
same as larger rank unless the two nodes have the same rank, pick

• FIND-SET: Nothing changes–use path compression. Does
not affect rank.

This is the state of the art disjoint set implementation.

Q: How good is the worst-case sequence complexity?

It is possible to prove that the worst-case time for a sequence of
m operations, where there are n MAKE-SETs, is O(m log∗ n).

Q: What is log∗ ?

It is the number of times that you need to apply log to n until the answer is

95
Example:

n = 40 ⇒      5   < log 40 < 6
⇒    2   < log log 40 < 3
⇒    1 < log log log 40 < 2
⇒    0 < log log log log 40 < 1
⇒
Back to Kruskal’s Algorithm
KRUSKAL-MST(G=(V,E),w:E->Z)
A := {};
insert the edges into a priority queue Q;
for each vertex v in V, MAKE-SET(v);
while (Q not empty)
e = EXTRACT-MIN(Q) \\e = (u,v);
if FIND-SET(u) =/= FIND-SET(v) then
UNION(u,v);
A := A U {e};
end if
end for
END KRUSKAL-MST
Q: If graph G has n vertices and is connected, then how many
edges does G have?
m>n−1
Q: Inserting the edges into a priority queue and extracting the
min for each edge takes how long?

m log m
96
with union-by-weight.

(Remember, linked-lists have a pointer back to the representative element

How many MAKE-SETs do we do?
n
Complexity?

O(n)

Q: How many FIND-SETs do we do?

at most 2m-since we could visit the endpoints of an edge at most 2 times.

Complexity?

O(m)

Q: How many UNIONs do we do?

at most m
Complexity?

at most O(n log n)

So the worst-case complexity of Kruskals is O(m log m + n +
m + n log n).
The bottleneck is the sorting (priority queue step). Therefore the
complexity is O(m log m)
97

```
DOCUMENT INFO
Shared By:
Categories:
Tags:
Stats:
 views: 37 posted: 8/13/2011 language: English pages: 19
How are you planning on using Docstoc?