Tristan's Collection of Interview Questions: 2011

Friday, December 23, 2011

Something about Template Specialization

When you have a generic template class, some functionalities are not shared by all the data types and they are no needed for some specific data types. Template specialization just kicks in for this occasion. You can first declare a generic template for "all" the data types, then for a specific data type, you can provide a specific implementation. For example, if you want to implement a vector template, but you want to be space-efficient for bool type (e.g., use bit-vector to store them) other than some generic approach (e.g., use one integer to store one bool object). Then you can do as follow:

//generic template
template <typename T>
class vector
{
    // accessor functions and so forth
    private:
    T* vec_data;   // we'll store the data as block of dynamically allocated 
                   // memory
    int length;    // number of elements used 
    int vec_size;  // actual size of vec_data
};

//specialized
template <>
class vector <bool>
{
    // interface

    private:
    unsigned int *vector_data;
    int length;
    int size;
};

Besides, the interface for this specialized interface can be totally different from the generic template: they can have different methods. In some sense (or scenarios), template specialization can implement (substitute) the functionality of virtual inheritance. The major drawback of template specialization is increased code size and resulting complexity.

Similar to template specialization, we can also have partial specialization, which means you specialize certain features but still leave some feature for users to choose. The following shows one such example.

//generic template
template <typename T, unsigned length>
class fixedVector { ... };

//partial specialization
template <unsigned length>
class fixedVector <bool, length> { ... };

As we can see here, in the partial specialization, the vector is determined to contain the bool type, but the length can still be specified.

Sunday, December 11, 2011

Two's Complement

Two's Complement is a system used to effectively represent negative integers such that we can treat the negative number as ordinary number and reuse the addition.

The problem arouses from One's Complement. For example, 2 is denoted as "00000010" and "-1" is denoted as "11111110" in One's Complement. Then naive way of doing "2 + (-1)" will get "00000000". That is incorrect. We need to further add a carry bit "1" to the sum. Then we can get "00000001", which is the right answer. However, we need to do an extra addition here.

In Two's Complement, the negative number is denoted by inverting all the bits in a positive number and adding 1. For example, "-1" is denoted as "11111111". Then we can just use the ordinary addition which save one extra addition. Besides, there is only one "0" in Two's Complement, but two in One's Complement (0 and -0).

Tuesday, November 8, 2011

Enumerate the Ways to Cover a M*N Matrix with Dominoes

Problem: Given a M*N matrix, enumerate all the different ways to cover it with dominoes (1*2 blocks). For example, for a 1*2 matrix, there are only one way; for a 2*2 matrix, there is 2 ways. The original problem is from poj 2411.

Solution: We can definitely use backtracking, but more efficient approaches should be DP. First we need to think what are the states and how to represent them. Obviously, the arrangement of dominoes on the matrix is the state. There might be multiple ways to represent them. Here we just introduce one approach: every cell in the matrix is either 0 or 1; "0" means the domino block is placed horizontally and "1" means the domino is placed vertically. If a cell, say a[i,j] is marked as "1", a[i+1,j] can't be marked as "1". Then we can use the binary sequence to represent the status of one row. For example, "0000" means two dominoes lay horizontally. Besides, there are several key observations:

For each row, there must be even number of consecutive "zero"s. Otherwise, we cannot fully cover a row.
The status of current row only depends on the previous row. It means we only need to look back one row.
Then we build a status matrix d[i][s] with i as the row number and s as a particular state. d[i][s] means the number of ways to arrange a matrix with current state (all i-1 rows are filled and the ith row is with state s). Therefore, d[i][s] =sum_{all possible d[i-1][k]}.
How to decide d[i-1][k], first, s&k should be zero, which is to conform to the constraint " a[i,j] is marked as "1", a[i+1,j] can't be marked as "1". Second, s|k should be a valid arrangement, which is to conform to the constraint " there must be even number of consecutive zeros". Then the cell d[M][0] stores the answer to our problem.

typedef long long LL;
const int maxn = (1 << 12);

LL h , w , dp[12][maxn];

//to decide if it contains odd number of consecutive "zero"s
bool judge(LL s)
{
     LL cnt = 0;
     for(LL i=0; i < w; i++)
     {
         LL t = s & 1;
         if(t)
         {
             if(cnt & 1)
                 return false;
             else
                 cnt = 0;
         } else
             cnt ++;
         s >>= 1;
     }
     if(cnt & 1)
         return false;
     return true;
}

int main()
{
     while(~scanf("%lld %lld", &h, &w))
     {
         if(h == 0 && w == 0)
             break;
         memset(dp , 0 , sizeof(dp));

         //initialize the state in row 1
         for(LL i=0; i < (1 << w); i++)
         {
             if(judge(i))
             {
                 dp[1][i] = 1;
             }
         }
         for(LL i=2; i <= h; i++)
         {
              for(LL j=0; j < (1 << w); j++)
              {
                 for(LL k=0; k < (1 << w); k++)
                 {
                     LL t = j & k;
                     if(t)
                          continue;
                      //so here "t" is the actual state 
                      //on the current row, since if 
                      //the position in the above 
                      //row is marked as "1", the 
                      //position in current row must 
                      //be "0", otherwise this 
                      //iteration will be skipped by 
                      //previous "continue". But, 
                      //even though it is "0", we 
                      //need to count it as "1", 
                      //since it cannot hold domino.
                      //This is what "t=j|k;" 
                      //really means.

                      t = j | k;
                     if(judge(t))
                      {
                         dp[i][j] += dp[i - 1][k];
                      }
                 }
              }
          }
         printf("%lld\n", dp[h][0]);
     }

}

The code is from xIao.wU.

Saturday, November 5, 2011

The Longest Palindrome Substring (Manacher's algorithm)

Problem: Given a string, find a longest palindrome substring.

Solution: We can use general suffix tree that stores the original string and its reverse, which is an O(N) algorithm. However, here we give a better one with less space overhead while still O(N) complexity. This algorithm is called Manacher's algorithm. If we check a string from left to right, we can leverage the palindrome check we did previously. This is from the symmetry of palindrome. The main idea is as follow:

Create an array called P[], P[i] stands for the longest palindrome centered at location i. Here i is not the index in the original string. For the original string, the locations we need to check for palindromes contains those characters in string along with the spaces between characters. So if we have a string of length l, we need to have a P[] with length 2*l+1.
Our goal is to fill in P[]. For a particular position, we check its left and right. If equals, we extend our check further. Otherwise, the longest palindrome centered at location is found.
However, we need to be smarter. Actually we can leverage previous computed P[i] when we calculate a P[x] where x>i.
So here we add two pointers, p1 and p2, which point to the left and right of the current location i such that |i-p1| = |i-p2| and p2>i>p1. We know p1 refers to a palindrome t and i refers to a palindrome s. If the first character of t is strictly on the right of the first character of s, we know P[p2] = P[p1].
Otherwise, say if the first character of t is not strictly on the right of the first character of s, we have P[p2] >= r - p2. where r is the right bound of the palindrome that centered at i. We then need to check if the palindrome at p2 can be longer than r - p2. The good thing is that we only need to start the characters beyond the length of r - p2.
When the first character of t is strictly on the right of the first character of s, we don't need to move the current center (i). Only when the first character of t is not strictly on the right of the first character of s, we need to move the current center to p2.
The total cost is O(N).

The code is as follow:

void manacher(const string &s)
{
    int len = s.size();
    if(len == 0) return;

    int m[2*len+1];
    m[0] = 0;
    m[1] = 1;
    // "cur" is the current center
    // "r" is the right bound of the palindrome
    // that centered at current center
    int cur, r;
    r = 2;
    cur = 1;

    // iterate from 2 to 2*len+1
    for(int p2=2; p2<2*len+1; p2++)
    {
        int p1 = cur- (p2-cur);
        //if p1 is negative, we need to 
        //move "cur" forward

        // re-adjust cur based on p2
        while(p1 < 0)
        {
           cur++;
           r = m[cur] + cur;
           p1 = cur- (p2-cur);

        }

        // If the first character of t is 
        // strictly on the right of the 
        // first character of s

//

        // Or here, from the symmetry, if

        // the palindrome centered at cur

        // cover the palindrome centered at

        // p1, we know
        if(m[p1] < r - p2)
            m[p2] = m[p1];
        //otherwise
        else
        {

           // we need to explore the length of

           // the palindrome centered at p2

           // if the palindrome centered at cur covers

           // p2, we can start at "k = r-p2"

           // otherwise, we start at "k=0"

           //reset "cur" 
           cur = p2;
           int k = r-p2;
           if(k<0) k = 0;
           while(1) 
           {
              if((p2+k+1)&1)
              {
                if(p2+k+1 < 2*len+1 && p2-k-1 >=0 && s[(p2+k)/2] == s[(p2-k-2)/2])
                  k++;
                else break;
              }
               else
              {
                if(p2+k+1 < 2*len+1 && p2-k-1 >=0)
                  k++;
                else break;
              }

           }
           // set the right boundary to be "p2+k"
           r = p2+k;
           m[p2] = k;
        }


    }

 
}

Thursday, November 3, 2011

About Smart Pointer, Copy-on-write

This link is very instructive.

Find the Longest Sub-sequence that is a Palindrome within a String

Problem: Given a string, you can delete any characters, find the longest sub-sequence (the characters remained after your deletion) that is a palindrome.

Solution: For palindrome problem, one trick often used is to reverse the string. Here we first reverse the string, then find the longest common sub-sequence between the new string and the original one. It is a O(n^2) solution. Remember, there could be multiple longest common sub-sequences, some of them may not be palindrome, you need do some checks.

Tuesday, November 1, 2011

About American Flag Sort

American Flag Sort is an extension of Dutch Flag Problem which divides an array into three group. American Flag Sort further divides an array into multiple buckets based on certain order. For example, 256 buckets based on ASCII value. Essentially, it is a in-place radix sort. It is favored for sorting integers and strings givens its speed and space advantage.

The algorithm needs two passes. First scan the array, so that we know the number of the the objects that will be stored in each bucket.
Then we know the start location of each bucket. Scan the array for the second time, put the object to the corresponding bucket and update the bucket cursor.
If sorting strings, may need to apply the same algorithm to sort objects within each bucket.

0-1 Knapsack Problem

Problem: Given a knapsack that can hold at most W weight items, also given a list of items with their weight wi and value vi (no items share the same weight), try to find a valid assignment which achieve the highest value in the knapsack (can't be over-weighted at the same time).

Solution: We can use DP to solve this problem. However, one-dimension DP is not enough. If we only record state s[0], s[1], ... s[W], the later state may not be able to reuse previous states. Instead, we need a two-dimension DP here:

First sort the items based on their weights
The status we want to calculate is s[i, w], which means if the total weight is w and we can only use up To the ith item (based on weight and non-descending), the optimal maximum value we can get.
s[0, w] = 0 and s[i, 0] = 0.
For s[i, w], if wi > w, s[i, w] = s[i-1, w]; otherwise, s[i, w] = max ( s[i-1, w], s[i-1, w-wi] + vi).

Multi-Processor Scheduling Problem

Problem: Give a multi-processor machine with n processors and m jobs, how to make a scheduling that has the shortest finish time. Each job may take different amount of time to process.

Solution: There are also some variants to this problem, like putting m integers to n buckets. The solution to this problem can be based on some heuristics and is called LPT algorithm. Attention: this is a greedy approach which is a sub-optimal solution, not an optimal solution.

Basically we first sort the job based on their process time.
Choose the job with the longest process time among the current unprocessed jobs and put it into a machine which has the earliest finish time so far.
Repeat the above steps until all jobs are assigned.
The complexity is O(m*logm+m*logn+n).

Friday, October 28, 2011

Find 4 Elements in An Array that Sum to A Target Value

Problem: Given an array and a target value, find 4 elements in the array that sum to this target value.

Solution: Here introduce an O(n^2) algorithm. Only briefly explain the idea:

Calculate the sums of all pairs in the array and form a new array. Each element in the new array is a struct that stores the sum and the two elements who make the sum. Sort the array by the sum. The cost is O(n^2) till now (may need radix sort).
The problem is converted to find 2 elements in the new array that sum to the target value. The cost for this operation is the length of the new array, which is also O(n^2). So total cost is O(n^2).

Some Problem Solved by Suffix Tree

Problem 1: Find the longest common substring among k strings.

Solution: If we use DP, it takes O(n1*n2*...nk). With suffix tree, it takes O(n1+n2+...nk). Basically, we build the suffix tree for n1 first, then add n2 to this tree, then n3... All the strings share the same generalized suffix tree. We need to track if each node (path) is shared by all the string or not. When the building is done, the longest common substring will be found.

Problem 2: Find the longest repeated substring in a string.

Solution: First, build the suffix tree for this string. Then find the deepest internal node, from the root to that node is the substring we are looking for. The "deepest" is decided by the number of characters.

Thursday, October 27, 2011

Serialize a Binary Tree

We know with pre-order and mid-order we can decide a binary tree. Besides, tree have array representation. However, to have better space efficiency, we can just store a binary tree according to its pre-order, but with null node also stored!

Find Two Substrings with Longest Distance from Two Strings

Problem: Given two strings s1 and s2, s1' and s2' are two substrings of s1 and s2, respectively. Find the s1' and s2' with longest distance. Distance is defined as \sum_i| s1' [i] -s2' [i]|.

Solution: Obviously, s1' and s2' should be of equal length. Here give a O(n*m) algorithm where n and m are the length of s1 and s2, respectively.

Calculate c[i][j] = |s1[i]-s2[j]|, then we have a matrix.
Swipe the matrix with a diagonal line. For example, s1 = "35645", s2 ="2475", we have matrix c as follow:
1 1 4 2
3 1 2 0
4 2 1 1
2 0 3 1
3 1 2 0
First we scan "2", then "4 0", then "1 2 1". You calculate the sum of the numbers in the line. Later you will find the maximum is "3 2 3 0" or "3 2 3", which stands for the distance of "5645" and "2475".

Find the Kth (smallest) Pair from the Elements in Two Sorted Arrays

Problem: Given two sorted array, by choosing one element from each array we have a pair. A pair can be ranked by the sum of the two numbers within it. Find the Kth smallest pair.

Solution: Naive approach takes O(n*m). Here we can borrow the idea that is used to merge multiple arrays. Basically we need a heap or a priority-queue. The main idea is as follows:

Put the first pair (which is definitely the smallest) {a[0], b[0]} in the heap as bootstrapping.
Loop over the heap, if heap is not empty, pop the top of the heap {a[i], b[j]}. If i<n-1, push {a[i+1], b[j]}; If j<m-1, push {a[i], b[j+1]}.
pop k times we will get the kth smallest pair. The complexity is O(k*log(n+m)).

Tuesday, October 25, 2011

Volume of Water Held by a Histogram

Problem: Pour water on a histogram, calculate the water volume the histogram can hold.

Solution: This is a relatively simple DP problem. Here we only give the main idea.

For a particular bar bi, if we know the highest bar on its left Li and highest bar on its right Ri. If the height of bi is smaller than both Li and Ri, the water volume can be held on this bar is min(Li, Ri) - hi; otherwise, it can't hold water.
To calculate Li and Ri, we just need to record the maximum height we had observed so far from the left (and from the right). Therefore, a O(n) algorithm is straightforward here.

Maximum Rectangle Area within a Histogram

Problem: Given a histogram, find the maximum rectangle area within it.

Solution: It is easy to find a O(n^2) algorithm by comparing the height of a bar with the rest bars. Here we give a O(n) algorithm.

For each bar bi, we need to know the number of adjacent bars on the left Li and on the right Ri that are higher than it. Then the maximum rectangle within the histogram with the height hi will be (Li+Ri+1)*hi. Then we just need to select the largest one.
When deciding the number of adjacent bars on the left of bi, the key observation is that we don't need to inspect every bar on its left. The key here is to use a stack to track the bars that had been inspected in a smart way. The bars in the stack are in ascending order (from base to top). Besides, before pushing new bar, we need to pop the bars in the stacks that are higher than it. Then we can achieve calculate Li in O(n). Calculating Ri will be the same.

The following only shows how to calculate Li:

//bar[] represents the height of each bar in the histogram
//left[] stores the number of adjacent bars that are taller than bi 
//this stack is used to track inspected bars
stack s; 

for(int i=0; i<N; i++)
{

  int num = 0;
  int idx;
  while(!s.empty())
  {
    idx = s.top();
    if(bar[idx]>bar[i])
    {
       num++;
       s.pop();
       left[i]+= left[idx];  
    }
    else break;
  }

  s.push(i);
  left[i]= num ? num + left[i]:0;

}

Monday, October 24, 2011

Find the Convex Hull

Problem: a series of points are on plain (X>0 and Y>0), find the convex hull of these points, a.k.a, the points which can form a hull that include all the other points.

Solution: Here only main idea is given.Basically we can use Graham scan.

Find the point p0 with the smallest y coordinate; if multiple exist, choose the one with the smallest x coordinate, this point must be one of the vertices on the convex hull.
Sort the other points based on the angle of p0->pi.
Put p0, p1 and p2 in a stack (we need at least three to form a hull), then we inspect the rest points based on the order. Assume px is the stack top and py is the one next to the top, if py->pi is at the right side of py->px, pop px. We apply the same check to the new px and py until the previous condition doesn't hold. Then we push pi if previous there are any pop operations. If before push pi, there are only two elements in stack, we can push pi directly.
After we iterate all the remaining points, the points on the stack are those that form the convex hull. The complexity is O(nlogn), primarily the sorting cost.

More: Alternatively, we can use Jarvis’s march which is asymptotically faster than Graham’s scan. The time complexity is O(nh) where h is the number of the vertices on the hull. The main idea is as follow:

Find the highest point p_high and lowest point p_low among all the points, which takes O(n).
Start from p_low, find the next point that has the smallest polar angle (using +x axis) with respect to p_low. Assume this point is p', then find the next point that has the smallest polar angle with respect to p'. Repeat such process until we find p_high. Up to now, we had found the left chain of the hull.
Then proceed to find the right chain of the hull. It is similar to finding the left chain. We start from p_high and stop when we reach p_low. The only difference is that when we calculate the polar angle, we use -x axis instead of +x axis.

Output a set's all sub sets

Problem: Given a set, output all its subsets. For example, for set (a,b,c), we need to output (), (a), (b), (c), (a,b), (a, c), (b,c), (a,b,c).

Solution: iterative approach is not difficult. Here give a recursive approach:

void _print(const vector<int>& set)
{
    i; i<< set[i] << " ";
    cout << endl;
}


void print_set(const vector <int>&set, int i, vector<int>& output)
{
   if(i >= set.size()) _print(output);
   else {
     output.push_back(set[i]);
     print_set(set, i+1, output);
　　　// code to handle duplicates
     int tmp = output.back();
     output.pop_back();
     
     if(output.size()>0 && tmp== output.back()) return;
          
     print_set(set, i+1, output);
   }
}

int main()
{
    vector<int> set,output;
    set.push_back(1);
    set.push_back(2);
    set.push_back(3);
    set.push_back(4);

    print_set(set, 0, output);
}

Why we need virtirtual destructor?

When you release the resource of an class object, usually you leverage the object's destructor. Assume we have two class here: base Class A and derived Class B, for the following code:

class A{
  public:
    ~A(){}
};

class B: public A{
   public:
    ~B(){}
};

B *b = new B();
delete b;

First B's destructor will be called, followed by A's. On the other side, if we have the following code:

class A{
  public:
    ~A(){}
};

class B: public A{
   public:
    ~B(){}
};

A *a = new B();
delete a;

Only A's destructor will be called. In order to also call B's destructor in the second code example, the ~A() should be declared as virtual.

Friday, August 12, 2011

Find the Closet Pair of Points

Problem: Given N points on a cartesian ordinate plane, find the two points with the shortest distance.

Solution: We can use the approach that solving P(n) based on P(n-1). We first sort the points based on their X coordinates. Then assume we have solved the problem with the first n-1 points and the shortest distance is d. Then given the nth point, we don't need naively compute the distance between the nth point and all the previous n-1 points. We can leverage the important information we have obtained: "d". If we can further find a closer pair given the nth point, the other point in the pair should be in a specific area. The area is a rectangle with dimension (d*2d) that sits on the left of the nth point. Moreover, the number of potential points that could be in this area is limited, at most six points (which can be proved using pigeonhole principle). Therefore our algorithm comes as follows:

two important data structures are used. The first one is an array X_array that contains all the points sorted by their X coordinates. The second is a BST Y_BST that contains the active points sorted by their Y coordinates. Insert the first two points into X_array and Y_BST. Initialize d_min as the distance between the first points.
using a sweeping line method. Have a vertical line sweep from left to right and start from the third point in X_array.
when the line touch a point p, given the current d_min, first we need to remove the points in X_array that have distance larger than d_min to p. We also need to remove these points in Y_BST. Then we add p to Y_BST and adjust the active region within X_array accordingly.
search the points in the range [p.y-d_min, p.y+d_min] in Y_BST, for each point in the range, calculate its distance to p. If smaller than d_min, update d_min.
When all the points have been swept, the algorithm finishes.
The complexity for the whole algorithm is O(nlogn). Why? The initial sorting cost for X_array is O(nlogn). For each points being removed from the active region in X_array, it takes O(logn). So the total cost for removing points from the active region in X_array is O(nlogn). Similarly, the total cost for removing points from Y_BST is also O(nlogn). Since there are limited number of points in the rectangle with dimension (d*2d). The cost to calculate the distance between them and p is constant. Therefore, the complexity for the whole algorithm is O(nlogn).

typedef pair<double, double> Coordinate;

bool compX(Coordinate p1, Coordinate p2)
{
   if(p1.first != p2.first)
        return p1.first < p2.first;
   else return p1.second < p2.second;
}

bool compY(Coordinate p1, Coordinate p2)
{
  if(p1.second != p2.second)
    return p1.second < p2.second;
  else return p1.first < p2.first;
}

double cal_distance(Coordinate p, Coordinate q)
{
   double a = abs(p.first-q.first);
   double b = abs(p.second-q.second);

   return sqrt(a*a + b*b);
}

double find_closest(vector<Coordinate> &co)
{

   //sort based on X coordinates
   sort(co.begin(), co.end(), compX);

   if(co.size() == 0) return -1;
   if(co.size() == 2) return cal_distance(co[0], co[1]);

   double d_min = cal_distance(co[0], co[1]);
   //inition the y table
   bool(*fn_pt)(Coordinate,Coordinate) = compY;
   set<Coordinate, bool(*)(Coordinate,Coordinate)> y_table(fn_pt);
   y_table.insert(co[0]);
   y_table.insert(co[1]);

   //start from the third point from the left
   int start = 2;

   vector<Coordinate>::iterator tp;
   vector<Coordinate>::iterator it_head = co.begin();
   vector<Coordinate>::iterator it_tail = it_head+1;

   for( ; start<co.size(); start++)
   {
       double l_bound = co[start].first - d_min;
       Coordinate lc(l_bound, 0);
       tp = lower_bound(it_head, it_tail, lc, compX);

      //delete points from y_table
      vector::iterator pp, endp;
      if(tp == it_tail)
      {
         if((*tp).first > (*it_tail).first)
           tp = it_tail+1;
      }

      for( pp = it_head; pp!=tp; pp++)
      {
         y_table.erase(*pp);
      }
      it_head = tp;
      it_tail++;
      y_table.insert(co[start]);

      //select points from y_table
      double y_low = co[start].second - d_min;
      double y_high = co[start].second + d_min;

      //search for these two bounds
      set<Coordinate>::iterator sp, sp_low, sp_high;
      Coordinate y_lc(0, y_low);
      Coordinate y_hc(0, y_high);
      sp_low = lower_bound(y_table.begin(), y_table.end(), y_lc, compY);
      sp_high = lower_bound(y_table.begin(), y_table.end(), y_hc, compY);

      //calculate the distance and compare
      for(sp=sp_low; sp!=sp_high; sp++)
      {
        if(sp->first == co[start].first && sp->second == co[start].second)
             continue;
        double new_d = cal_distance(co[start], *sp);
        if(new_d < d_min) d_min = new_d;
      }

   }

   return d_min;
}

Alternative approach: We can also try to use K-d tree, more precisely, 2-d tree. K-d tree is basically a binary tree. It is constructed by recursively partition the points based on a particular axis (dimension). For example, for a 2-d tree, we first partition by x-coordinate (it is also ok to partition by y-coordinate), given all points, we find the points which x-coordinate is the median of the x-coordinates of all other points. Then we take the points which has this x-coordinate as root. The remaining points are divided into two groups: a) those which x-coordinates are smaller than root.x b) those which x-coordinates are bigger than root.x. Then we recursively partition these two groups, but by y-coordinate. This will help us find the nodes on the second level (assume root is the first level). To find the nodes at the third level, we partition by x-coordinate. Repeat this until we can't partition any more.

Then we need to apply the Nearest Neighbor Search (NNS) in 2-d tree. Here I just briefly explain the algorithm.

First, we need to locate the query point query point q. This process is similar to a binary search based on q's coordinates. We stop when we encounter a leaf node. Calculate the distance between this leaf node and q. This is the current best (nearest neighbor).
However, we need to move forward since the current best may not be optimal. We just rewind the recursion, go to the parent of the leaf node. Calculate the distance between this (parent) node and q. If necessary , update the current best. We also need to see if there remains some nearer neighbors in the other plain divided by this (parent) node. This can be done by checking if the circle centered at q with the radius current best across the splitting line. If yes, we need to check the other branch of this node. Basically, we repeat previous steps, searching q in that subtree.
If we are done with the (parent) node, we just keep going up, check the parent of this (parent) node, until we encounter the root node.

The empirical performance for a NNS in 2-d tree is O(logn). Then do the NNS for all the nodes is O(nlogn). Considering the cost for building 2-d tree is O(nlogn), the total cost for this problem is also O(nlogn).

See Also: The closest pair problem

Saturday, July 30, 2011

Something about Interrupt, Mutex and Semaphore

The details of interrupt and mutex are not that simple. Be aware of the following bullets:

Disable/Enable interrupt can be used to implement mutex. It is simple but flawed.
For multi-processors, disable interrupt may not be useful, unless you disable interrupt on all processors, which will be prohibitively expensive.
Atomic instruction, such as test-and-set, can be used to implement mutex. test-and-set is an atomic instruction which writes 1 to a memory location and fetch the old value of that location. For example, a spin-lock implementation is given as follow:

while (test_and_set(lock)==1);

Besides, other atomic instructions such as exchange, compare&swap, load linked and conditional store can also be used.
For the example given in 3), we will busy wait. To minimize the waiting time, we can do the following (add a guard variable):

while (test_and_set(guard)==1);

if(lock_value == 1)
{
   put the thread in the waiting queue for the lock;
   go to sleep;
   guard = 0;
}
else
{ 
   lock_value = 1;
   guard = 0
}

Spin lock can sometimes delay releasing the lock. For example, thread A get switched out right after grabbing the lock. Thread B kicks in and try to grab the lock, but the lock has already been grabbed. Thread B has to be busy waiting, which will delay the switching back to thread A.
Semaphore represents the number of resources that are still available to simultaneous users (e.g. there are still 4 available slots)..
Besides, we can also use implement some lock-free data structure in pursuit of better performance. These structures usually leverage some atomic instructions or atomic variables (the read/write to the variable is atomic, e.g. pointer type.). Some data structures are weak enough to be implemented without special atomic primitives, e.g., FIFO queue.

Friday, July 29, 2011

Things to Remember about memcpy()

If you want to implement memcpy(), you need to pay attention to the following things:

if you can use wider data type, use it (e.g., copy a 32 bit instead of 8 bit).
leverage the instruction set provided by the underlying architecture (processor). For example, some architecture has *p++, some only has *(++p).
memory alignment. Sometimes if you want to do 32bit copy, the instruction may require memory address to be aligned. Therefore some extra work needs to be done.
use pointer such as const char * for src.
if the src and dest memory address overlap with each other, some extra work needs to be done.

Some Tricks about Optimizing Branches

You should be cautious when using branches in your code. Sometimes the performance could be hurt.

while (++i > count)

The above code may generate complex machine code. It is much better to let the compiler to generate code that utilizes the processors compare instructions in an efficient way. This means creating loops that terminates when a compare value is zero, which is as follow:

while (count--)

The following code is also not good:

while (count-=2)

Sunday, July 24, 2011

Find the Least Common Ancestor of Two Tree Nodes

Problem: Given two nodes in a tree, find the least common ancestor of them.

Solution: This is not a difficult problem. If there is parent pointer, things will be easier. However, simple solution still exist for tree nodes without parent pointer. See the code below:

node * common_ancestor(node *root, node *n1, node *n2)
{
   if(root==null || n1 == null || n2 == null)
     return null;

    if(n1 == root || n2 == root) return root;
   
   node * left = common_ancestor(root->left, n1, n2);
   node * right = common_ancestor(root->right, n1, n2);
   
   if(left && right) return root;
   
   return left ? left : right; 

}

More: if we need to do many common_ancestor() operations, there is a more efficient off-line algorithm called "Tarjan's off-line least common ancestors algorithm". It can get the least common ancestors for m pairs of nodes in just one tree traversal. The main idea is to leverage disjointed forest with path compression and union by rank.

Friday, July 22, 2011

General Monty Hall Problem and Information Theory

Problem: There are three doors and only there is one door behind which there is a prize.You first select a door, then the host of the game will open one door behind which there is no prize (he can't open the door you had selected). Then you are given the choice to switch the door or not. This is the typical Monty Hall problem people talk about. The general form is that n doors with only one door behind which there is a prize. The host of the game will open m doors (m<n-1) behind which there are no prize. Then you are given the choice to switch to another door.

Solution: The most most important observation about this problem is that "probability" is actually not about "randomness", but about information. How much information we have will influence the probability". Now let's first look at the basic form of the Monty Hall problem.

assume you chose door A, the probability that you had made the correct choice, P(A), is 1/3. Assume the host opened the door C (the case for door B is the same), let us denote this action as O. The probability, P(A|O), which means the probability that you had made the correct choice given the host opened the door C, is what we are interested in. Actually P(A|O) = P(A) = 1/3, since unless you move the prize, there is no way you can change the odds of your original choice.
Besides, we can systematically calculate P(A|O). According to Bayes's theorem, P(A|O) = P(O|A)*P(A)/P(O). We know P(A) = 1/3. P(O|A) should equal 1/2, since if A is the right answer, B and C are both empty doors. The the probability for the host to open door C is simply 1/2. Then we need to calculate P(O).
P(O) = P(O|A)*P(A) + P(O|B)*P(B) + P(O|C)*P(C). It is easy to know, P(O|C) = 0, also P(O|A)*P(A) =1/6. We have P(O|B) = 1. The reason is that we had already chosen A, but B is the right answer, so the only choice for the host is C. Therefore, P(O) = 1/6+1*1/3 = 1/2. Then we have P(A|O) = 1/3.
Then the probability to win if we switch is P(B|O) = P(O|B)*P(B)/P(O) = 1*1/3 / (1/2) =2/3, so we need to switch!

Now let's try to tackle the general problem.

The probability to win if we stick to the original choice (assume it is A again), P(A) = 1/n, also we have P(A|O) = 1/n.
The probability to win if we switch, P(S|O) = P(We switch to the correct door | Our original choice is wrong) = (1/(n-m-1))*(1-1/n) = ((n-1) / (n-m-1))*1/n). It is easy to know P(S|O) > P(A|O), so we still need to switch.
Thinking in the way of information theory, after host reveal some empty doors, we are given more information. These information will change the probability distribution!

The Mystery about sizeof()

sizeof() is to get the size of a class or an class object. It might be simple conceptually, but there are some details you may not know.

what does sizeof() return?

class A
{
  static char x;
  int y;
};

int main()
{
  int s1 = sizeof(A);
}

s1 = 4. Why s1 = 4? The size of static members will not be counted into the size of the class. The reason is that those static members are stored centrally and shared by all the instances of the class.

what does sizeof() return?

class A
{
  char x;
  int y;
};

int main()
{
  A a;
  int s1 = sizeof(A);
  int s2 = sizeof(a);
}

s1 = 8 and s2 = 8. Why s1 = 8? The size it really needs is just 5 bytes. The reason is for alignment so padding is added. Why s2 = 8? sizeof(a) is equivalent to sizeof(A). Moreover, the padding scheme will sometimes make the order of data members matter. For example, if we have "char a; int x; char b;", the size will be 12. However, if we have "char a; char b; int x;", the size will be 8.

what does sizeof() return?

class A
{
  char x;
  int y;
  virtual void bar();
};

int main()
{
  int s1 = sizeof(A);
}

s1 = 12. Why s1 = 12? Since virtual function is defined, 4 extra bytes need to be allocated for the pointer to virtual function table.

what does sizeof() return?

class A
{
  char x;
  int y;
  void bar();
};

int main()
{
  int s1 = sizeof(A);
}

s1 = 8. Why s1 = 8? Since there is no virtual function defined, we don't need the 4 extra bytes for the pointer to virtual function table. For non-virtual function, there is a central place to find those functions, therefore we don't need such pointer.

what does sizeof() return?

class A
{
  char x;
  int y;
  void bar();
};

class B{
   int a;
   A aa;
   virtual void somefunction() ;
}

int main()
{
  int s1 = sizeof(B);
}

s1 = 16. Why s1 = 16? The size of class A is 8. class B has one integer, one class A instance plus the pointer to the virtual function table. So the total size is 8+4+4 = 16 bytes.

what does sizeof() return?

int foo(int n)
{
  char b[n+3];
  return sizeof(b);
}

int main()
{
  int s1 = foo(8);
  
}

s1 = 11. Why s1 = 11? In this case, at compile time, the compiler can't know the size of array b. The size is known at the run time. That is to say, sizeof() can be evaluated at run time for some case. But for the general case, it is evaluated at compile time.

what does sizeof() return?

class ABase{ 
        int iMem; 
}; 

class BBase : public virtual ABase { 
        int iMem; 
}; 

class CBase : public virtual ABase { 
        int iMem; 
}; 

class ABCDerived : public BBase, public CBase { 
        int iMem; 
}; 

int main()
{
   int s1 = sizeof(ABase);
   int s1 = sizeof(BBase);
   int s2 = sizeof(CBase);
   int s4 = sizeof(ABCDerived);
}

s1 = 4, s2 = 12, s3 = 12 and s4 = 24. Why ? In this case, because BBase and CBase are derived from ABase virtually, they will also have an virtual base pointer (different from the pointer to virtual function table). So, 4 bytes will be added to the size of the class (BBase and CBase). That is sizeof ABase + size of int + sizeof Virtual Base pointer.Size of ABCDerived will be 24 (not 28 = sizeof (BBase + CBase + int member)) because it will maintain only one Virtual Base pointer.

what does sizeof() return? (edited on Aug 12)

void foo(int a[])
{
  cout << sizeof(a) << endl;
}

int main()
{  
   int a[10];
   foo(a);
   cout << sizeof(a) << endl;
}

. The sizeof() in foo() will return 4 while the one in main() will return 40. The reason is that the a in foo() is actually interpreted as a integer pointer.

Thursday, July 21, 2011

The Ugly Thing about char [], char , const char , char * const and char const *

These concepts seem simple, but mistake can be made if you haven't thoroughly understood them.

is following code correct?

foo()
{
char a[]= "I HATE U!";
a[0] = 'U';
}

Yes, array a[] will be allocated on stack and a[0] just modifies the first character.

is following code correct?

foo()
{
char *a = "I HATE U!";
a[0] = 'U';
}

Compiler won't complain, but the program will crash. The reason is "I HATE U!" is a constant that will be allocated in the constant memory region. If we try to modify its value thru pointer a, you will be hit by seg fault.
However, initializing the variable takes a huge performance and space penalty for the array (using char a[] = "XXXX"). Only use the array method if you intend on changing the string, it takes up space in the stack and adds some serious overhead every time you enter the variable's scope. Use the pointer method otherwise.

is following code correct?

foo()
{
cont char *a = "I HATE U!";
a[0] = 'U';
}

No, compiler will complain. a is a pointer points to char constant, you can modify what a points to (a's value), but you can't modify the content of the address that a points to (*a's value).

is following code correct?

foo()
{
char * const a = "I HATE U!";
a++;
}

No, compiler will complain. a is a constant pointer, therefore you can'y modify what a points to (a's value). But you can modify the content of the address that a points to (*a's value) if that address is valid to access. In the above example, the address is not valid to access.

is following code correct?

foo()
{
char const *a = "I HATE U!";
a[0] = 'U';
}

No, compiler will complain. char const *a is the same as const char * a.

Monday, July 18, 2011

Find the Sub-Matrix with the Largest Sum in a Matrix

Problem: Given a m*n matrix that is made up of integers (positive and negative), find the sub-matrix with the largest sum.

Solution: There is an O(n) algorithm to find the sub-array with the largest sum in a one-dimension array. So here, we will try to reduce our problem to the one-dimension array problem.

Assume the sub-matrix is between row a and b where 0<=a<=b<=m, then we discard all the rows that are out of the range [a, b]. We then squeeze the sub-matrix into a one-dimension array. How to squeeze? Just use the sum of the elements remaining in each row to replace that row! Therefore we get a one dimensional array with n elements (each element is a sum).
Then apply the O(n) algorithm we mentioned previously, we will get the sub-array with the largest sum in this one-dimension array, which is actually the sub-matrix between row a and b with the largest sum.
Until now, we have examined one pair a and b. If we try all the combination of a and b, we will be able to find the the sub-matrix in the with the largest sum in the m*n matrix. Assume n<m, we need repeat n^2 such operation. Then the overall time complexity is O(n^3). Naive way may take O(n^4).
In order to make the "squeeze" operation (summing the column elements) with a constant time complexity, we need to do some pre-processing. We just need to calculate the prefix-sum. For example, c[i][j] = a[0][j] + a[1][j] +...+ a[i][j] . It is easy to calculate this for the whole matrix with time complexity O(n^2).

Search for an Element in a Matrix

Problem: A m*n matrix with its each row and each column sorted in ascending order. Given a target element, find that element in the matrix or return null if not found.

Solution: The following gives an O(m+n) algorithm.

Start from matrix[0][n-1], compare it with the target element. If equal, return; if matrix[0][n-1] is smaller than the target value, go vertically down to next row (examine matrix[1][n-1]); if matrix[0][n-1] is bigger than the target value, go horizontally left to the previous column (examine matrix[0][n-2]).
Compare the current cell in the matrix with the target element. If equal, return; if current cell is smaller than the target value, go vertically down to next row. if current cell is bigger than the target value, go horizontally left to the previous column.
Stop when it reaches the last row or first column or the target element is found.
We actually can use binary search here.

Sunday, July 17, 2011

Paint a Cube with Three Colors

Problem: Given a cube, each face can be painted with one of the three colors. Calculate the ways the cube can be colored.

Solution: The problem can be solved by enumeration. While a more formal way is to use Burside's Lemma. Generally we need to understand the symmetry of the permutation on cubes. The main ideas of this method are listed as follows:

find the permutation groups of a cube, the easiest one is the identity permutation (just don't change). Then for the six faces, we have 1->1, 2->2 ... and 6->6. So for this permutation, there are 6 circles and the length of each circle is 1. Therefore we have a1^6, where a1 means the sub group that with circle length 1. "6" means there are 6 such circles.
then we can rotate along the axis that penetrates the centers of two opposite faces for 90 degree. If we do this, we have a1^2*a4, since two face unchanged, and the rest 4 form a circle. We have 6 such rotations. Therefore at the end we have 6*a1^2*a4.
then we can rotate along the axis that penetrates the centers of two opposite faces for 180 degree. If we do this, we have a1^2*a2^2.We have 3 such rotations. Therefore at the end we have 3*a1^2*a2^2.
then we can rotate the two opposite vertices (diagonal). We will have a3^2, since three faces that share one vertex are in one circle. We have 8 such rotations. Therefore at the end we have 8*a3^2.
then we can rotate along the axis that penetrates the middle points of two parallel edges that are not in the same face for 180 degree. We will have 6*a2^3 at the end.
so totally we have 1+6+3+8+6 = 24ways of permutations. So the cycle index (the ways to paint the cube) = 1/24*(a1^6+6*a1^2*a4+3*a1^2*a2^2+8*a3^2+6*a2^3). Since a1=a2=...=a6, we have cycle index = 1/24*(a^6+3*a^4+12*a^3+8*a^2).
when a = 3, we have cycle index = 57.

Find the Kth Positive Integer That Can Be Divided Only by 3, 5 or 7

Problem: find the kth positive integer that only have factors as 3, 5, or 7. For example, the first integer is 3, then 5, 7, 9, 15, 21......

Solution: It is a DP problem where we should leverage on previous status. Then we can avoid redundant computation. The main ideas are stated as follow:

Have three queues, let us say Q3, Q5 and Q7. Put 3, 5 and 7 in these queues, receptively.
Then select the smallest element among the three queues by just comparing the heads of the queues. If the smallest element s is from Q3, Q3.enque(s*3), Q5.enque(s*5) and Q7.enque(s*7); If the smallest element s is from Q5, Q5.enque(s*5) and Q7.enque(s*7); If the smallest element s is from Q7, Q7.enque(s*7). Since we strictly follow the ascending order, we guarantee that there is not redundant computation.
Repeat the selection (step 2) for k times, what we get is the kth positive integer that only have factors as 3, 5, or 7.
The time complexity and space complexity are both O(k).

Saturday, July 16, 2011

Calculate the Occurrence of "2" from 1 to N

Problem: Given an integer N, calculate the occurrence of digit "2 " in all the numbers from 1 to N. For example, if N=24, then the integers that have digit "2" are 2, 12, 20, 21, 22, 23 and 24. So totally eight "2"s ( "22" contains two "2"s).

Solution: It is a straightforward solution, Just inspect each digit starting from the tail. For each digit, we need to think of three cases: >2, =2 and <2. The code is listed as follow:

int calculte_2(int input)
{
     int cnt = 0;
     int base = 10;
     int quotient,remainder;

     do
     {
        quotient= input/base;
        remainder = (input%base) / (base/10);
        
        if(remainder>2) cnt += (quotient+1)*base/10;
        else if(remainder<2) cnt +=  quotient*base/10;
        else if(remainder==2) cnt +=  quotient*base/10 + (input%(base/10)) + 1;               
        base*=10;
           
     }while(quotient>0);
     
     return cnt;     

}

Friday, July 15, 2011

Calculate Square Root and Cubic Root

Problem: Calculate the square root of a non-negative real number.

Solution: The basic idea is to start with a seed, then begin trial. If our guess (seed) is smaller, than we increase it. Otherwise, we decrease it. However, to get a good convergence speed, we need guess the square root in a smart way. The following describes Babylonian method, which is a quadratically convergent algorithm.

Start with a seed x, assume S is the real number we want to know its square root, calculate S/x.
Then the square root of S must be in between x and S/x. We choose the average of x and S/x as the new guess and repeat step 1. If certain accuracy is achieved, we stop.
To select the starting seed, if S has d digits and S>1, choose 2*10^((d-1)/2) if S is odd, choose 6*10^((d-2)/2) if S is even. The case for S<1 is similar.

More: This is actually a special case of Newton's method. Newton's method can be used to find the roots of a real value function. The basic idea is that from a point (x0, y0) on the curve, we draw a tangent, assume the tangent across x axis at x1. Assume the root is xr, f(xr) = 0, we have |x1-xr| < |x0-xr|. If we keep draw tangent at x1, we will be even closer to xr. Basically we have x_k+1 = x_k- f(x_k) / f'(x_k).6

More more: This method can be extended to calculate the cubic root as well. The only thing we need to change is rather than calculating the average of x and S/x as the new guess, we calculate the average of x , x and S/x^2 as the new guess. Therefore, x' = (2*x+S/x^2)/3.

Tuesday, July 12, 2011

Maximum Flow and Minimum Cut Problem

Problem: Given a network that contains a single source and a single sink, find the maximum (aggregated) flow from the source to the sink. This is Maximum Flow problem.

Solution: The solution is as follow:

find a flow in which each edge has strictly positive capacity. Then find the bottleneck edge in the flow. Subtract each edge's capacity with the bottleneck edge's capacity.
try to find another flow in which each edge has strictly positive capacity. Repeat step 1.
if such flow cannot be found, terminate. The maximum flow is the sum of all the capacity of the bottleneck edge of the flow processed.

Minimum Cut problem is from Maximum Flow problem. Basically, first run maximum flow algorithm to find those bottleneck edges. Then the minimum cut is from those edges.

More: Maximum flow (minimum cut) is frequently used in the problem related to flow network. Some other problem which can be solved by (or converted to) maximum flow problem are maximum matching in bipartie graph, minimum path cover, etc.

See Also: Maximum Flow and Minimum Cut

Monday, July 11, 2011

Check If a Graph is a Bipartite Graph

Problem: Given a graph, check it is a bipartite graph. A bipartite graph is a graph whose vertices can be divided into two sets. Then for every edge, the vertices connected by the edge are distributed in the two different sets (can't be both in one set).

Solution: If is similar to a coloring problem. We can do a BFS traversal to the graph. We need some extra space to store i) if a vertex is visited ii) the parity of that vertex. Or we can just use two hash tables.

start from any vertex in the graph, we mark that vertex as visited, its parity as 1. Then we set the parity of its neighbors as 2. Since all the neighbors are not visited now, we put them in a queue.
remove a vertex from the queue, if it is visited, ignore it. Otherwise, mark it as visited, check all its visited neighbors, if the parities are all odd or even, then the graph is not bipartite. Otherwise, set the parity of its non-visited neighbors as i+1 (assume the current vertex's parity is i) and add non-visited vertices to the queue.
repeat the above step util the queue is empty. If the graph is not fully connected, need one extra step to check if all the vertices are visited

Sunday, July 10, 2011

Reverse the Words in a String in Place

Problem: Reverse the order of words in a string. For example, given "I like Juventus", "Juventus like I" should be the output.

Solution: First in place reverse the whole string. We will get "suntevuJ ekil I". The we just reverse the characters in each word.

Saturday, July 9, 2011

The Comparison between BST, Hashtable, Array and Linked List

BST:

locating/deleting/inserting an element is not slow -- O(logn)
maintain the order between elements
support range query
if not balanced, the worst case is like a linked list

Hashtable:

look up operation is fast: O(1) if the hash function is well chosen.
have space overhead
don't support range query

Array:

allow random access
can explore cache locality
updating is costly

Linked List:

allow easy insertion/deletion
do not support random access
may not have cache locality

Tuesday, July 5, 2011

Find the Kth Smallest Element of Two Sorted Arrays

Problem: given two sorted arrays, find the Kth smallest element.

Solution: if we have two pointer scanning from the heads of two arrays, we can have an O(K) algorithm. Here we give the main idea of an O(logm + logn) algorithm, where m and n are the length of the two arrays, respectively.

We pick the ith and jth elements from two arrays. We make i+j = K-1. Then if Ai > Bj && Ai < Bj+1, Ai is the Kth smallest element. The other way is the same (Ai < Bj).
If the previous condition doesn't hold, which means Ai < Bj && Ai < Bj+1. Then the Kth smallest element cannot be within A0 to Ai and Bj+1 to Bn-1. Then we remove them and only need to look for the K-i-1th smallest element in the sub-arrays left.

Find the Median of Two Sorted Arrays

Problem: given two sorted arrays, find the median.

Solution: Assume the length of the two arrays are m and n. When m+n is even, the median is the average of two numbers. The neat solution has a time complexity of O(log(m+n)). Basically, we need to leverage binary search. The details of this algorithm is very complex due to many corner cases to handle. The following just gives the main idea.

get the median of two arrays Ai and Bj, where i = m/2 and j = n/2. If Ai <= Bj, the median will be between Ai and Bj.
therefore, we can discard A0 to Ai-1 and Bj+1 to Bn-1. However, we cannot do this in a naive way. To reduce to an equivalent sub-problem, we need to discard the same number of elements from each array. Then we keep comparing the middle elements of the sub-arrays. To discard the same number of elements, we just need to get Min(i, n-j-1).

Besides, we can also use the techniques in Find the Kth Smallest Element of Two Sorted Arrays.

More: a more general problem is to find the median for K sorted arrays. There are two ways:

Guess a number, search each array to see how many elements are smaller than this number. Then we can have a total number. If this total is smaller than half, we guess a bigger number; otherwise, we guess a smaller number. Try to repeat binary search until our goal is met.
The second approach is to first find the medians of all the arrays. Then we can know the bounds of these medians (low_m, high_m). For each array, throw the elements that are out of the bounds. Make sure the elements thrown at the two ends of the array should be the same number. Then repeat until our goal is met.

Estimate 3^100

Problem: estimate the value of 3^100.

Solution: A so called "72" rule will be used. The rule states if the percentage of annual GDP increase is a, it takes 72/a years to double the GDP. So 3^100 = 9^50 = 10^50/1.1^50 = 10^50/(1.1*1.1^(7*7)) = 10^50/(1.1*2^7) = 10^50/140 = 1000/140 *10^47 = 7*10^47. Another way is to leverage 3^9 = 20000.

Longest Common Subsequence

Problem: Longest common subsequence is the subsequence that appear in multiple sequences (usually two) simultaneously. For example, the longest common sequence of "13486" and "23861' is "386". The diff program is based on this algorithm. Besides, this algorithm is frequently used in bioinformatic research.

Solution: Still we will use DP, but different from longest Increasing subsequence, the solution we have is O(n*m), where n and m are the length of two sequence. If we have more than two sequence, the time complexity will be O(n1*n2*n3*...). The following gives the algorithm for two sequences:

look at the last elements A[n], B[m] of two sequences, if A[n] == B[m], we know the longest common subsequence (LCS) between A[n] and B[m], LCS(A[1:n], B[1:m]) = LCS(A[1:n-1], B[1:m-1]) +1.
if A[n] <> B[m], then LCS(A[1:n], B[1:m]) = MAX(LCS(A[1:n-1], B[1:m]), LCS(A[1:n], B[1:m-1])).
then we will have a matrix for LCS, after filling up the matrix, we will get the answer.

We can further do some optimization. One is to optimize the space, the LCS matrix. If most diffs center on the middle of sequence, we can trim the common part first. Also to be fast, we don't need to compare each character, we can hash the string or the line. Then we just compare the hash value sequences.

More: The solution to LCS problem can be easily applied to Edit Distance problem. Given two string Xi and Yj, and a number of operations such as copy(), delete(), insert(), etc. Find the optimal way to transform Xi to Yj. Each operation has a cost and our goal is to minimize the total cost of the transformation. Here only gives the recursive relations:

Define T(i,j) as the total cost to transform Xi to Yj (one direction). There are several recursive relations.
T(i,j) = T(i-1, j-1) + cost(copy), if X[i] == Y[j]
T(i,j) = T(i-1, j-1) + cost(replace), if X[i] <> Y[j]
T(i,j) = T(i-1, j) + cost(delete)
T(i,j) = T(i, j-1) + cost(insert)
find T(i,j) as the minimum T(i,j) in step 2,3,4,5

Longest Increasing Subsequence

Problem: Longest increasing sequence is the subsequence within a sequence that the elements in the subsequence is in an increasing order. We are trying to find the longest subsequence that has this property. For example, one of the longest increasing subsequences in "17385" is "138".

Solution: This is a classical DP problem. Naive DP solution will give you a O(n^2) algorithm by keep the longest increasing sequence ended at i. A better solution is to use DP alongside with binary search, which will give you a O(nlogn) algorithm. The solution is as follow:

having an array M, M[j] stores the ending position of the increasing sequence with length j in the original sequence X. Thus, X[M[j]] should be an increasing sequence as well.
iterate the original sequence X, when visiting X[i], do binary search in X[M[j]] to look for the closest position to X[i]. If X[i] is bigger than all the existing X[M[j]], it means if we append X[i] to the end of the current longest increasing sequence, we get a new longest increasing sequence which is longer by 1. Then we update M[L+1] = i (L is the length of current longest increasing sequence).
when searching in in X[M[j]] for X[i], we can also encounter the situation that X[M[j]]<X[i]< X[M[j+1]]. Then we need to update M[j+1] = i. The reason is that after we have scanned the ith element in X, we need to assure for M[j], we store the increasing sequence of length j that has the smallest ending element. This will maximize our chance to be able to append further elements to existing increasing sequence.

Sunday, July 3, 2011

Get the Kth Permutation

Problem: Given an array of N numbers {1,2...N}, get the Kth permutation. For example, when N = 3, the first permutation is "123", the second is "132", the sixth is "321".

Solution: With STL, we can use next_permutation(). Just call it K-1 times. However, next_permutation()'s complexity is not constant. A better way is to use recursion:

String Find(string a, int k)
{
   if(len(a)==1 or k==0) return a;

   int N = len(a);
   
   int i = k/factorial(N-1);
   int j = k%factorial(N-1);

   return a[i] + find(a.substring(0,i) + a.substring(i+1), j);
}

Binary Heap

Binary Heap is an important data structure which can be used to implement priority queue. Two types of binary heap are commonly seen: min-heap and max-heap. A binary heap has two properties:

Shape property: the heap should be a full binary tree (can be denoted as an array)
heap property: children should be smaller (bigger) than parent

When inserting an element into heap, we insert from bottom (leaf). Then we check if the two properties hold, otherwise we swap parent and child. When removing an element from heap (always from root), we use the last element at the last level to replace root, then we adjust heap. So,

The time complexity of insertion is O(logN)
The time complexity of removal is O(logN)
The time complexity of building a heap can be O(N)

With binary heap, a K-way merge can be done in O(N*logK), naive ways take O(N*K). Besides, we also have min-max heap which can insert remove the min and max element in O(logN). Basically, it is a mixed heap with odd level to be min heap and even level to be max heap.

Collecting Apples

Problem: The original problem is from TopCoder. Given a M*N matrix, in each cell, there are a number of apples. When you visit one cell, you can grab the apple away. You make three passes to collect apples. In the first pass, you start from top-left, go either right or down until reading bottom-right. In the second pass, you start from bottom-right, go either up or left until reading top-left. In the third pass, you go from top-left to bottom-right again. What are the maximum number of apples you can collect?

Solution: This is a DP problem. But first we need to do some reduction to convert the problem to a easier one. If the problem only contains the first pass, it is an easy DP problem.

One key observation is the second pass can be reversed so that the problem can be transferred to that three paths starting from top-left to bottom-right.
We create a status matrix MAX[][][], MAX[i][j][k] denote at a certain round, the column coordinates of the three paths (you can get the row coordinates from i,j,k and round number x).
From the previous position to current position, at most three cells should be visited. They are (x-i, i), (x-j, j) and (x-k, k). Then we can get the apples at the three cells (remember don't add the duplicate cell). There are multiple ways (2*2*2) to reach these three cells (from up or from left). They we need to leverage the MAX[][][] from previous round to find a previous state to the three cells which can maximize MAX[i][j][k].
There are totally M+N-2 rounds and after the final round, the answer will be stored in MAX[N-1][N-1][N-1]

The key algorithm is as follow:

public int mostApples(string[] L) {

    int X=L[0].Length, Y=L.Length, XY=X+Y;
    int[,,] best = new int[X,X,X]; 
    int[,,] b2 = new int[X,X,X];     
     
    for (int A=0; A<X; A++) 
    for (int B=0; B<X; B++) 
    for (int C=0; C<X; C++)    
        best[A,B,C] = -999999999;

    best[0,0,0] = 0;

    for (int t=1; t<=X+Y-2; t++)
    {    
        for (int A=0; A<X; A++) 
        for (int B=0; B<X; B++) 
        for (int C=0; C<X; C++)
        {
           int aa = t-A, bb=t-B, cc=t-C;
           if (aa < 0 || bb < 0 || cc < 0 || aa >= Y || bb >= Y || cc >= Y) continue;
  
           int bestHere = 0;
           int delta = (int)(L[aa][A] - 48);
           
           if (B != A) delta += (int)(L[bb][B] - 48);
           if (C != A && C != B) delta += (int)(L[cc][C] - 48);
 
           if ( A>0 &&  B>0 &&  C>0) bestHere = Max(bestHere, best[A-1, B-1, C-1] + delta);
           if ( A>0 &&  B>0 && cc>0) bestHere = Max(bestHere, best[A-1, B-1, C  ] + delta);
           if ( A>0 && bb>0 &&  C>0) bestHere = Max(bestHere, best[A-1, B,   C-1] + delta);
           if ( A>0 && bb>0 && cc>0) bestHere = Max(bestHere, best[A-1, B,   C  ] + delta);
           if (aa>0 &&  B>0 &&  C>0) bestHere = Max(bestHere, best[A,   B-1, C-1] + delta);
           if (aa>0 &&  B>0 && cc>0) bestHere = Max(bestHere, best[A,   B-1, C  ] + delta);
           if (aa>0 && bb>0 &&  C>0) bestHere = Max(bestHere, best[A,   B,   C-1] + delta);
           if (aa>0 && bb>0 && cc>0) bestHere = Max(bestHere, best[A,   B,   C  ] + delta);
           b2[A,B,C] = bestHere;
         }
 
         int[,,] temp = best; 
         best = b2; 
         b2 = temp;
  }
 
return best[X-1,X-1,X-1];
}

Thursday, June 30, 2011

Get the Intersection Points of Two Link List

Problem: There are two link lists that have an intersection point (they have the same end), which makes them forms a "Y" shape. Find the intersection point.

Solution: Definitely we can traverse one list and hash all the nodes. Then traverse the other list. But if we want a O(1) space complexity, we can try the following two methods:

Traverse both lists to get the their length. Then move the pointer from the head of the longer list by len(long_list) - len(short_list) steps. Then move both pointers in parallel and compare.
Traverse one list, make the last node point to the head to form a circle and remember the length l. Move one pointer l steps, starting from the head of the second list. Then also start to move a second pointer, still from the head of the second list. Two pointers move in parallel. The node they meet each other is the intersection point.

Tuesday, June 28, 2011

Segment Tree and Interval Tree

To build a Segment Tree, first sort all the end points of the segments, than adding -infinity and +infinity. For n segments, we then partition (-infinity, +infinity) into 2n+1 segments. These 2n+1 segments serve as elementary intervals. From them (considering them as leafs), we can build a balanced binary tree. Then we add the n segments into the tree. Each node represents an interval. Non-leaf nodes represent the interval which is the union of their children. When adding a segment, we store the information of that segment in all nodes where the nodes' intervals are within the segment's interval. Besides, the nodes' parents' interval are not within the segment's interval.

building cost is O(nlogn), space cost is O(nlogn), the cost to find all segments that contain a point (stabbing query) is O(k+logn)
can also solve box overlap problem, Klee's measure problem.
For multi-level segment tree, first build a base segment tree on one axis. Then build small segment tree on each node of the base tree, only for the segments stored on that node.
For more details, please check http://www.cnblogs.com/TenosDoIt/p/3453089.html (in Chinese).

To build an Interval Tree, it is a little bit similar. First sort all the end points of the intervals and find the median points. Every node in the interval tree is associated with a point. And the node contains all the intervals that cover the point. The left child (sub-tree) of the node contains all the intervals whose upper bounds are smaller than the node's point. Similarly, we can define the right child.

building cost is O(nlogn), space cost is O(n), the cost to find all intervals that overlap with an interval (range query) is O(k+logn).
The result is a ternary tree with each node storing:
- A center point
- A pointer to another node containing all intervals completely to the left of the center point
- A pointer to another node containing all intervals completely to the right of the center point
- All intervals overlapping the center point sorted by their beginning point
- All intervals overlapping the center point sorted by their ending point
If we need to do a range query with interval tree, see we want to find the set of r that overlap with interval q. We first to sort all the start/end points of intervals, then find those have either start/end inside q. Second, we need select one point in q, do the query on the interval tree to find all the intervals that have the point. At the end, we need to merge results and do the dedup.

See Also: Klee's measure problem

KMP algorithm

KMP algorithm stands for Knuth–Morris–Pratt algorithm. The problem it wants to solve is to search for a word W in a text string S efficiently. The followings only give the main idea of this algorithm:

The essence of the algorithm is to try to avoid comparing a character more than once. Assume we have two cursors m and i. m points to the start matching position for W within S. n points to the current location within W where we fail (we only had a partial match). To search for W in the rest of S, we don't need to start the next match check beginning at m+1, we can choose the position in a smart way.
To be smart, we need the help from an auxiliary table T . T has the same length as W. T[i] means if the match fails at the position i, how much we need to look back (reset the start cursor). Thus, m will be reset as m+i-T[i]. Moreover, we don't need to start matching process from the new start position, instead, we can just start from the new i that equals toT[i].
This table is mainly for those patterns which contain some non-prefix substring that are also some prefix of the patterns. For example, in "ABCDABD", substring starting at position 4, "AB", is also the prefix "AB". On the contrary, for "ABC", there are no need to fall back.
The time complexity is O(n+k), where n and k are the length of S and W, respectively.

The following gives the table building algorithm:

table[0] = 0;
    table[1] = 0;

    int cnd = 0;

    for(int n =2; n<p.size();)
    {
       if(p[n-1] == p[cnd])
       {
          cnd++;
          table[n] = cnd;
          n++;
       }
       else if(cnd>0)
       {
         // This is the complicated part, if we can't find 
         // a match,  we need to gradually fall back, 
         // to look at shorter string to see if we can 
         // find a match.
         cnd = table[cnd];
       }
       else
       {
          cnd = 0;
          table[n] = cnd;
          n++;
       }

    }

Reservoir Sampling

Reservoir sampling is a sampling algorithm that randomly pick k elements out of a set of n elements. It is adapted from Knuth Shuffling. The key idea is to maintain a buffer with size k, then replace the elements in buffer with a descending probability. The application of this algorithm is when sampling a finite subset from streaming data. Reservoir sampling is very useful for online sampling on streaming data.

int buf[k];
//initialize the buffer
for(int i=0; i<k; i++)
   buf[i] = a[i];

for(int i=k; i<n; i++)
{
   int idx = random(0, i);
   if(idx<k) swap(buf[idx], a[i]);
   
}