Wednesday, March 7, 2012

About Disjoint-Set Forests

Disjoint-Set Forests is very useful to solve the problems such as connected component in a graph. For example,  we have several set {a, b, c}, {d, e}, {f}. The elements in set is vertices that connected to each other. If we have an edge e(b,f), then we can merge the set {a,b,c} and {f} since now they are connected.

The common operations in disjoint-set are Make-Set, Union and Find-Set. Here Union is the operation that merge two sets and Find-set is that given a element, find out which set this element is in. To model the disjoint-set problem. We can use a forest such that each tree is a set. The representative of a set is the root of that tree. And each element in the tree has a parent pointer. If we want to do Union, we can just put one tree as the sub tree of another. If we want to do Find-Set, we just start from that element and follow its parent pointer until we reach an element whose parent is itself.

However, to make the operations efficiently, two optimization can be made:

  1. Union by rank: In order to keep Find-Set efficient, we need to keep the depth of the tree small. In order to do that, for each tree, we can maintain a rank which stands for the largest depth of the tree. When we Union two trees, we make the tree with smaller rank as the subtree of the tree with bigger rank. Basically, we made the the parent pointer of the root of the smaller ranked tree to point to the root of the bigger ranked tree.
  2. Path compression: Still we want to further improve Find-Set, the idea is after we do a Find-Set, actually we had a path from root to the query element. Then we can modify all the element in this path to point their parent pointer to the root, which will shorten the path for future query.

No comments:

Post a Comment