Monday, August 27, 2012

Comparing int with Enum

Problem: this may not be an interview question but it is quite interesting. Let's first look at some examples.  If we have following code that compare int with enum, what will happen (be printed out)? We will test these examples using gcc.

*Updated the post inspired by Nemo's great comments.

enum MY_ENUM{
  MY_OK = 0,
  MY_NOT_OK
};

void foo()
{
   int i = -1;
   enum My_ENUM my_enum = MY_OK;

   if( i < my_enum) printf("I am OK!\n");
   else printf("I am NOT OK!\n");

}

For the above example,  we will see "I am NOT OK!". Why, the reason is in this example my_enum is regarded as unsigned int. Therefore, it will be converted into unsigned before the comparison. After the conversion, it will always be non-negative so the if will take the else branch. But what about this code snippet (just a little bit modification):

enum MY_ENUM{
  MY_OK = 0,
  MY_NOT_OK
};

void foo()
{
   int i = -1;
   enum My_ENUM my_enum = MY_OK;

   if( i < MY_OK) printf("I am OK!\n");
   else printf("I am NOT OK!\n");

}

It will print "I am OK!" instead. The reason is  in this example  MY_OK  is regarded as int.

Why we will see results like this? It is kind of confusing. To find out the things under-hood, we need to look at the C standard. There are several sections related to our problem.

  • Section 6.4.4.3 says enumerators have type int. 
  • Section 6.7.2.2/4 states that the enumeration type has a compatible type capable of representing the values of all enumerator members.
  • Section 6.7.2.2/2 suggests that an enumerator initialized with a constant expressionthat falls outside the range of an int has unspecified or undefined behavior.
Section 6.7.2.2/4 can be used to explain the first example. The compatible type capable of representing the values of all enumerator members is unsigned it, since we only need to represent 0 and 1.Therefore   my_enum is unsigned int. Section 6.4.4.3 can explain the second example since MY_OK is an enumerator.

Then what if you want enumeration type to be int? Just follow Section 6.7.2.2/4 and have the enumeration defined as follow:

enum MY_ENUM{
  MY_NEG = -1
  MY_OK = 0,
  MY_NOT_OK
};

since we only need to represent -1,0 and 1 here, the enumeration type needs to be int.

We should be careful about Section 6.7.2.2/2. Since it suggests if we define something as follow, the behavior will be unspecified.

enum MY_ENUM{
  MY_NEG = -1
  MY_OK = 0,
  MY_BIG = 0x80000000
};
0x80000000 can't be represented in a 32bit int. Though in gcc, the enumeration type is still int here, we cannot expect a uniform behavior on other compilers.

Sunday, April 15, 2012

Find the Maximum of Two Integers without Comparison

Problem: Given two integers a and b, find the bigger one between them without using comparison (e.g., "if" statement).

Solution: Here we need a little bit "bit" operation. We know, if a-b>=0, max(a,b) = a; otherwise, max(a,b) = a - (a-b). Then you may understand the following hack:

int max(int a, int b)
{
   int c = a-b;

   //assume integer has 4 bytes, by shifting c 
   //rightward, what we get is the sign bit
   //(that indicate if c is negative or not
   int k = c >> 31;

   return  a - k*c;

}


Saturday, April 14, 2012

Find the Number of Groups of "1" in a Matrix

Problem: Given a matrix, each cell of the matrix is either "1" or "0". Consider that all the adjacent "1" form a group, find out how many such groups are in the matrix.  By"adjacent",  we mean a cell's left, right, up and down. The diagonal is not considered as "adjacent". For example, if we have a matrix as follow:

1 0 0 1
1 1 0 1
0 1 1 1
1 0 0 0

then the output will be 2. There are two groups of "1" here.

Solution: This is where disjoint-set can help. The algorithm is outlined as below:

  • Basically, we scan the matrix row by row and from left to right. We need some auxiliary data structure stat[][] to keep some states. The states include the parent of a cell (it actually means which group a cell belongs to and each group has one unique parent) and the group id.
  • Once we encounter a cell with value "1", if we find it is left and up neighbors are both "0", then we  can create a new group, and this cell also becomes the parent of the group.
  • If we find its left neighbor is "1" and up neighbor is "0", then the current cell belongs to the same group where its left neighbor is. Similarly, if we find its left neighbor is "0" and up neighbor is "1", then the current cell belongs to the same group where its up neighbor is. 
  • If we find both its left neighbor and up neighbor are "1", we need to do some Union() operation. First we need to find which group has a smaller group id between its left neighbor's group and up neighbor's group. Then we make the parent of the group that has a bigger id point to the parent of the group that has a smaller id. By doing this, we actually merge the two groups together and decrease the total number of groups by 1. If two groups have the same id, we don't need to do the Union().
  • We can also do some optimization such as path compression. Here we actually use a simplified disjoint-set approach. The total time complexity is close to O(M*N) where M and N are the dimensions of the matrix.
The code is as below:
typedef struct cell
{
  int value;
  pair parent;
}cell;


pair<int, int> parent(int x, int y, cell stat[][M])
{
    //here we also implemented path compression
    if(stat[x][y].parent.first!=x || stat[x][y].parent.second!=y)
    {
        stat[x][y].parent =  parent(stat[x][y].parent.first, stat[x][y].parent.second, stat);
    }
    return stat[x][y].parent;

}


int group_of_1(int a[][M])
{
    cell stat[N][M];

    int group = 0;
    for(int i=0; ilt;N; i++)
    for(int j=0; jlt;M; j++)
    {
       if(a[i][j])
       {
         int g1,g2;
         g1 = g2 = -1;
         pair<int, int> p1, p2;

         if(i>0 && a[i-1][j])
         {
            p1 = parent(i-1, j, stat);
            g1 = stat[p1.first][p1.second].value;

         }
         if(j>0 && a[i][j-1])
         {
            p2 = parent(i, j-1, stat);
            g2 = stat[p2.first][p2.second].value;

         }

         if(g1<0 && g2<0)
         {
             group++;
             stat[i][j].value = group;
             stat[i][j].parent = pair<int, int>(i,j);
         }

         if(g1<0 && g2 > 0) stat[i][j].parent = p2;
         

        if(g1>0 && g2<0)   stat[i][j].parent = p1;
         

        if(g1>0 && g2 > 0)
         {
             stat[i][j].value = g1 < g2 ? g1:g2;

             if(g1<=g2)
                  stat[p2.first][p2.second].parent = stat[i][j].parent = p1;

             
             if(g1>g2)
                  stat[p1.first][p1.second].parent = stat[i][j].parent = p2;
               
             if(g1!=g2) group--;

         }

      }

    }

    return group;
}


Friday, April 13, 2012

Four Men and Eight Hats

Problem: There are four men and eight hats (five black hats and three red hats). Each man will wear one hat, but he can't see which hat he is wearing, instead he can only see the hats the other three persons are wearing. Then each man is asked to guess which hat he is wearing (black or red?). The first man said he can't decide. The second man said he also can't decide. The third man, again, said he can't decide. Then it is the turn of the last man, can he or can't he make the right guess?

Solution: The key is to understand what "a man can't decide" exactly means. Let's name the four men, A, B, C, D. When A can't decide, it must be true that B, C, D are NOT all wearing red hats. Since there are only three red ones, if B, C, D are all wearing red hats, A can be sure that he is wearing a black hat. Let's use the same way to analyze B's situation. When B can't decide, it must be true that C, D are NOT both wearing red hats.  Then let's exam C.  When C can't decide, it must be true than D is NOT wearing a red hat. Then based on the all the information, D can be sure he is wearing a black hat.


Friday, March 23, 2012

Graceful Way to Shift an Array

Problem: We have an array of integers (or other type), given an integer k,  we want to shift the array elements by k. For example, if a[] = {1, 2, 3, 4, 5} and k = 2, the resulting array will be a[] = {4, 5, 1, 2, 3} .

Solution: This is an easy problem but a graceful solution is not that straightforward. When I mean "graceful", I mean a solution of O(n) time complexity and O(1) space complexity. The trick is try to reverse the array twice. The detail is as follow:

  • First, reverse the whole array. If a[] = {1, 2, 3, 4, 5}, after reversing, we have a[] = {5,4,3,2,1}.
  • Then reverse the subarray from idx 0 to idx k-1, with the previous example we have a[] = {4,5,3,2,1}. Then reverse the subarray from idx k to idx n-1, after this, we get a[] = {4, 5, 1, 2, 3}.

Thursday, March 22, 2012

about Josephus problem

Problem: Josephus problem is a classical problem which can be solved by DP. The problem assume n persons in a circle, then start from the 1st person, we eliminate a person after skipping a fix number of persons. The process stops when there is only one person left. We want to know the last person's index in the original circle. For example, if n = 5 and k = 2, at the beginning, we have 1 2 3 4 5, then we eliminate 2 and 4, then we eliminate 1 and 5, so the last person's index is 3.

Solution: For the general case where k is an arbitrary number, we use F(n, k) to denote the solution for n persons. Then we have F(nk) = (F(n-1, k) + k ) mod n. Why? we can think the problem as follow:

  • The first person we need to eliminate is the kth person, after he is removed, we have n-1 persons now.
  • However, we can not directly resort to the subproblem F(n-1, k), since now our process will start at the k+1th person at the original circle. Therefore we need to remap the indices. Since the process is applied on a circle, we can move the first k-1 persons (1st, 2rd, ... k-1th person at the original circle) to the end of the circle. Thus, assume the new index in F(n-1, k) is i, the corresponding index in F(nk) is (i + k) mod n.
The above approach will take O(N) time. For smaller k, actually O(logN) solution exists. For example, when k = 2, we have F(2n) = 2*F(n) -1 and F(2n+1) = 2*F(n) +1. Actually for = 2, even an O(1) solution exists: if n = 2^m + l where  0 =< l  < 2^m, the last person will be 2*l + 1.

Monday, March 12, 2012

Calculate the Number of "1" in the Binary Form of a Positive Integer

Problem: Given a positive integer, count the number of "1" in its binary form. For example, the binary form of "7" is "111", so the number of "1" is 3.

Solution: There is an elegant solution with a complexity of O(k) where k is the number of "1". The code is as follow:

int cout_numof1(int n)
{
 
   int cnt = 0;

   while(n)
   {
      n = n & (n-1);
      cnt++;
   }
  
   return cnt;
}

Sunday, March 11, 2012

Calculate log2() Using sqrt()

Problem: Calculate log2()  by only using sqrt().

Solution: The key here is you need to know log2(1+x) = x/ln2 = x/0.69 when x is very close to 1. Therefore the algorithm first use sqrt() to reduce x to be very close to 1 then we can use the formular.
double log2(double x)
{
   assert(x>0);

   double res = 1.0;
   while(x-1 > threshold)
   {
      x = sqrt(x);
      res = res *2;

   }

   return res * x/0.69;

}

Find the Minimum Set of Numbers That Sum to at least K.

Problem: Given an array of integers, find a smallest subset (with minimum numbers of integers) that sum to at least K. For example, a[] = {2,-5,7,8,12, 4} and K = 13, then one of the minimum set can be {8, 12}.

Solution: One solution is first sort the array, then begin from the max integer, calculate the cumulative sum until the sum is no less than K. This solution (depending on what kind of sort algorithms you use) usually cost O(nlogn). Here we introduce a selection based algorithm which on average costs O(n) but costs O(n^2) in the worst case. The approach is similar to kth order-statistics.

  • Randomly choose a pivot from the current array and partition the array into two parts p1 and p2p1 contains integers which are smaller than the pivot; p2 contains integers which are larger than or equal to the pivot.
  • Check if pivot + sum(p2) == K, if so, we already find the answer; if not, then if pivot + sum(p2) > K but sum(p2) <= K, we also find the answer; if  sum(p2) > K, we need to do further partition, but we just need to further partition p2;  if pivot + sum(p2) < K,  we need to do further partition, but we need to partition p1 instead and the target value K is updated as K - pivot - sum(p2).
The code is as follow:
void find_min_set(int *a, int start, int end, int target)
{

    /* generate secret number: */
    int s, e, k;

    s = start;
    e = end;
    k = target;

    int sum;

    while(1)
    {
       srand ( time(NULL) );
       int idx = ((int)rand()) % (e-s + 1) + s;
       
       /* partition and sum */
       int tmp = a[idx];
       a[idx] = a[e];
       a[e] = tmp;

       sum = a[e];
       int pos = s;
       for(int i = s; i < e; i++)
       {
          if(a[i] < a[e])
          {
             if(pos != i)
             {
                tmp = a[pos];
                a[pos] = a[i];
                a[i] = tmp;
             }
            pos++;
          }
          else sum+= a[i];

       }

      tmp = a[pos];
      a[pos] = a[e];
      a[e] = tmp;

      if(sum == k)
      {
        cout << "found " << endl;
        return;
      }
      else if(sum > k)
      {
         if(sum - a[pos] < k)
         {
            cout << "found " << endl;
            return;
         }
         s = pos + 1;
         sum = 0;

      }
      else
      {
        k = k - sum;
        sum = 0;
        e = pos - 1;
      }
    }

}

                   

Wednesday, March 7, 2012

Calculate Number of Different Ways to Climb a Stairway with Constant Space and Linear Time

Problem: There are three ways to climb a stair, you can climb 1 step, 2 steps or 3 steps each time. Given a n step stairway, how many of ways can you climb up to the top?

Solution: As known, this is a simple DP problem.  If we can maintain an array dp[] with dp[i] representing the different ways to climb up i steps, we can get the final result in O(N). However, we need O(N) space as well. But actually, since dp[i] = dp[i-1] + dp[i-2] + dp[i-3], we don't need O(N) space, instead, we just need a cache with size 3. Then we can have a space complexity O(1) algorithm:

int climb_stair(int n)
{
  int cache[3];
  cache[0] = 1; /*ways to climb 0 step */
  cache[1] = 1; /*ways to climb 1 step */
  cach2[2] = 2; /*ways to climb 2 steps */

  if(n<=2) return cache[n];

  for(int i=3; i<=n; i++)
  {
     cache[i%3] = cache[0]+cache[1]+cache[2];
  }

  return cache[i%3];

}

Similarly, we can have a O(1) space algorithm for Fibonacci numbers.

User Bin Crash problem

Problem: This is a problem from facebook buzz. The problem is that a plane is gong to crash,  you need to throw out W lbs of freight to survive the crash. Each type of freight fi is associated with a weight wi and price pi. If we throw one unit of fi out, we lost pi but the plane becomes lighter by wi. Our mission is to throw out minmum freights measured by their aggregate price and avoid the crash, which means the total weight of those freights should be larger than W.

Solution: The problem is very similar to the 0-1 knapsack. We can also use similar strategy to solve it. The detail is as follow:

  • Basically, we need first sort different types of freight by their weight. 
  • Then we will need to fill an array dp[n][W+1] where n is the number of different types of freights. dp[i][j] represents the minmum cost to throw out at least j lbs freight and we can throw up to ith type of freights (ranked by their unit weight and from light to heavy). 
  • To fill dp[i][j], we need first select the most cost/efficient type of freight among the i types of freights. Cost/efficient is measured by the unit weight of the freight divided by its unit price. Say the most cost/efficient type of freight among the i types of freights is kth type of freights, we have if j <= kth type's unit weight, dp[i][j] = min(dp[i-1][j], kth type's unit price); Otherwise,  dp[i][j] = min(dp[i-1][j], dp[i][j-kth type's unit weight] + kth type's unit price). We don't need to use greedy method, just do regular dp. dp[i][j] = min(dp[i-1][j], dp[i][j-ith type's unit weight] + ith type's unit price).
  • The time complexity of this approach is O(nW).

About Disjoint-Set Forests

Disjoint-Set Forests is very useful to solve the problems such as connected component in a graph. For example,  we have several set {a, b, c}, {d, e}, {f}. The elements in set is vertices that connected to each other. If we have an edge e(b,f), then we can merge the set {a,b,c} and {f} since now they are connected.

The common operations in disjoint-set are Make-Set, Union and Find-Set. Here Union is the operation that merge two sets and Find-set is that given a element, find out which set this element is in. To model the disjoint-set problem. We can use a forest such that each tree is a set. The representative of a set is the root of that tree. And each element in the tree has a parent pointer. If we want to do Union, we can just put one tree as the sub tree of another. If we want to do Find-Set, we just start from that element and follow its parent pointer until we reach an element whose parent is itself.

However, to make the operations efficiently, two optimization can be made:

  1. Union by rank: In order to keep Find-Set efficient, we need to keep the depth of the tree small. In order to do that, for each tree, we can maintain a rank which stands for the largest depth of the tree. When we Union two trees, we make the tree with smaller rank as the subtree of the tree with bigger rank. Basically, we made the the parent pointer of the root of the smaller ranked tree to point to the root of the bigger ranked tree.
  2. Path compression: Still we want to further improve Find-Set, the idea is after we do a Find-Set, actually we had a path from root to the query element. Then we can modify all the element in this path to point their parent pointer to the root, which will shorten the path for future query.

Tuesday, March 6, 2012

Convert an Array into a Sorted Array with Minimum Cost

Problem: Given an array of positive integers, and you only have two operations : a) decrease an integer by k  with a cost of k; b) delete an integer with a cost of the value of that integer. Try to use these two operations to convert the array into a sorted array with minmum cost. For example, a[] = {5,4,7,3}, the optimal way is to decrease "5" by 1 and delete "3" with the total cost of 4 (1+3). The resulting sorted array is {4, 4, 7}.

Solution: Here I give a O(N^2) solution by using DP. The key data structure we need is dp[i] which records the minimum cost to covert a subarray a[0, i] into sorted array and the last element of the sorted array should be a[i].  For the example array a[] = {5,4,7,3}, dp[0] = 0, dp[1] = 1 (since we need to decrease "5" by 1), dp[2] = 1, dp[3] = 7. After we had filled up dp[], we need one more data structure: aggr[]. aggr[i] = a[0] + a[1] + ... + a[i]. Then the minmum cost to convert the whole array cost_min = min{dp[i] + aggr[N-1] - aggr[i]}. Here  "aggr[N-1] - aggr[i]" means the cost that we  delete all the elements with a subscript larger than i.

Then the key is to calculate dp[i]. There are several observations here:
  1. if a[i] >=  a[j] (j<i),  we can construct a sorted array ended at a[i] by deleting a[j+1], a[j+2], ... , a[i-1] and then append a[i]. The cost of the resulting sorted array is dp[j] + the cost of deleting a[j+1], a[j+2], ... , a[i-1]. 
  2. if a[i] < a[j], to have a[i] as the last element, we need to decrease a[j] by a[j] - a[i]. Moreover,  we may not stop at a[j] and we need to keep look backward until we found a a[m] such that a[m] <= a[i], then the cost of the resulting sorted array is dp[m+1] + ∑(a[k] - a[i]) where m<k<j
  3. To compute dp[i], we need to look at all the j such that  j<i, depending on a[j] and a[i], we need to calculate the cost based on step 1 or 2. Then the minmum cost we encounter is the value of dp[i].
The code is as follow:
int min_convert_cost(int a[], int n)
{
    int dp[n];
    int aggr[n];

    aggr[0] = a[0];
    for(int i=1; i<n; i++)
        aggr[i] = aggr[i-1] + a[i];

    dp[0] = 0;
    for(int i=1; i<n; i++)
    {
       dp[i] = INT_MAX;
       for(int j=i-1; j>=0; j--)
       {
          int cost_i = 0;

          if(a[i] >= a[j])
          {
             cost_i = dp[j] + aggr[i-1] - aggr[j];
          }
          else
          {
              cost_i += aggr[i-1] - aggr[j];
              while(a[j]>a[i] && j >=0)
              {
                cost_i += a[j]-a[i];
                j--;
              }
              
              cost_i += dp[j+1];

          }
          if(cost_i < dp[i])
               dp[i] = cost_i;

       }
    }

   int min = INT_MAX;
   for(int i=0; i<n; i++)
   {
      int cost_i = dp[i] + aggr[n-1] - aggr[i];
      if(cost_i<min) min = cost_i;
   }

   return min;
}



Monday, March 5, 2012

String Reduction

Problem:  A string that only contains three types of letters :"a", "b", "c". Our goal is to reduce the string. If two different letters are adjacent to each other, we can reduce the two letters into another letter. The rule is like this: if we encounter "ab" (or "ba"), we reduce to "c";   if we encounter "bc" (or "cb"), we reduce to "a";  if we encounter "ca" (or "ac"), we reduce to "b". If two adjacent letters are the same, we can't reduce them. We need to find an optimal way to reduce the string as much as possible.

For example, for string "acbb", if we try to reduce "ac"first, we get "bbb". But, if we reduce "cb" first, we get "aab" and we can further make "aab" => "ac" => "b". The latter approach gives us an optimal reduction.

Solution: Here what really matters is the numbers of letter "a", "b" and "c". Let us name them as num_a, num_b and num_c. If two of them are zero, we can't reduce that string. Otherwise, we can reduce the string into a string that has a length of 1 or 2. If num_anum_b and num_c are all even or odd,  we can reduce to a string with length 2; If not, we can reduce to a string with length 1.

Then how to do the reduction? The detail is as follow:

  1. if the string has a length of 3 and contains one "a", one "b" and one "c", whatever order of the three letter are in, we can only reduce the string into a string with length 2; if the string has a length of 2 and contains two different letters, we can only reduce the string into a string with length 1. Let us regard these two cases as base cases.
  2. for a general case, we have num_a "a" num_b  "b" and num_c  "c". After every reduction, the sum of num_anum_b and num_c will decrease by 1, since we substitue two letters with one letter.  
  3. Then at each round, which two adjacent letters we choose to reduce? We try to find an adjacent pair that contains a letter which has the highest count. For example, if now, we have 3 "a", 4 "b" and 6 "c" in the string, we choose an adjacent pair that contains "c"since num_c = max(num_anum_bnum_c) . Can we find such pair? definitely we can. If there are multiple such pairs, choose a random one. Then after we reduce this pair, num_c--, max(num_anum_bnum_c) may decrease by 1, remain unchanged, or increase by 1. However, since max(num_anum_bnum_c) is up-bounded by num_a + num_b num_c. num_a + num_b num_c is decreasing after every round, then max(num_anum_bnum_c) will also decrease if we look at a long wrong. Therefore, by using our strategy,  max(num_anum_bnum_c) will eventually decrease to 1, which means we are encounter the base cases in step 1.
  4. Then when the string will be reduced to a length of 1 and when to a length of 2? We observe that is num_anum_b and num_c are all even, then after one transformation, they will become all odd; similarly, if there are originally all odd, after one transformation, they will become all even. Then according to the analysis in step 3), we know at the end, the max(num_anum_bnum_c) will eventually decrease to 1. But, they should still be all odd at that time (since "1" is odd). Therefore, at the very end, we will have num_a = 1, num_b = 1 and num_c =1, which will  eventually lead to a string of length 2. It is easy to prove that if num_anum_b and num_c are  not all even or odd, the string will be reduced to length 1.
  5. if num_anum_b and num_c are  not all even or odd, there must be only two cases: a) two of the counters are odd and one is even b) two of the counters are even and one is odd.  For example, if num_b is odd, num_a and num_c are both even. The string actually will eventually be reduced to  "b".
  6. if num_anum_b and num_c are all even or odd, there might be multiple different final forms.

Find the Minimun Vertex Cover for a Tree

Problem: Given a tree, find its minimum vertex cover. Wait, what is a vertex cover? Given a undirected graph G(V,E),  a vertex cover is a subset of V such that for any egde e in E, at least one of e's two endpoints should be in this subset (vertex cover).

Solution: The minimum vertex cover for a general graph is a NP-hard problem. However, for a tree, there is a linear solution. The idea here is to do DFS search plus post-order traversal. If we encounter a leaf node and the edge connecting this leaf node with its parent, we know in order to construct a vertex cover, we must include at least one of the node (the leaf node, or its parent). Here we can use a greedy approach. We can see selecting the leaf doesn't give us any extra benefit, while selecting the parent can give us some benefit, since the parent must be also connected to other nodes. By selecting the parent node, we can further "cover" some extra edges. With this strategy in mind, our algorithm is as follow:

  • we do a DFS search. When a DFS call on a child node returns, we check if the child and the parent are both unselected. If yes, we select the parent node.
  • After all the DFS finishes (we traverse the tree), those selected nodes form the minimum vertex cover. The cost is O(N).
The pseudo code is as follow:
void min_vertex_cover(TreeNode *root)
{
   if(isLeaf(root)) return; 

   for(int i=0; i<root->num_of_children; i++)
   { 
        min_vertex_cover(root->children[i]);

        if(!root->selected && !root->children[i]->selected)
               root->selected = true;
         
   }
}

Friday, March 2, 2012

Find the Diameter of a Convex

Problem: Give a convex that contains n vertices, find its diameter, basically the pair of vertices that have the longest distance.

Solution: This is the essential problem to solve the more general problem: find the farthest pair of points among n points, but now let's focus on this problem. Assume the vertices of the convex are ordered counter-clockwise (clockwise is also OK) and they are labeled as v1, v2, ..., vn, we can use method that mimics "rotating the convex" to solve the problem in O(n).

First we need to introduce the concept of antipodal pair, the diameter must be the distance between one antipodal pair. Therefore, if we found all antipodal pairs, we can get the diameter. Then how to find all the antipodal pairs efficiently? The detail is as follow:

  1. For edge vnv1 (the line that connects vn and v1), we follow the couter-clockwise order to find the vertex that is farthest from vnv1. We name it vk.  
  2. Then we move to edge v1v2, again, we try to find the vertex that is farthest from v1v2. We name it vp. It is easy to see any vertex from vk to vp form an antipodal pair with v1, then we add the pairs (v1, vk), ... , (v1, vp) to our antipodal set.
  3. Then we let vk = vp  and move to edge v2v3. Again try to find the vertex that is farthest from v2v3. This vertex will be the new vp. Repeat this process until we finish processing the edge vn-1vn.
  4. Now we get all the antipodal pairs, just do a linear scan to find the maximum.
When calculating distance, we can used signed area which avoids square root calculation.

More: To solve the farthest pair of points problem, basically we first first the convex hull of these points, then we apply this convex diameter algorithm. The total cost is O(nlogn) which is dominated by the cost of finding convex hull.

Thursday, March 1, 2012

Decide If a Point is Inside a Simple Polygon

Problem: Given a point x,  try to decide if x is inside a simple polygon P. P is denoted by its vertices {p1, p2, ... , pn} .

Solution: There exists an O(N) solution based on one important observation: any ray from x will cross the boundary of P for odd number of times if and only if is inside P. Here a ray is a line start from a point. Before designing an algorithm based on this observation, there are two cases that we need to be cautious. One is when the ray across some vertices of P. Under such case, we need another way to  define the number of times that the ray across the boundary of P. The second case is when x is actually on the boundary of P, but this is easy to tell. Therefore, the skeleton of the algorithm looks as follows:

  1. Decide if x is on the boundary of P, if so return True, otherwise return False. This step takes O(N) since there are N edges.
  2. Find a ray that starts from x (we can just use a vertical ray and the ray can ends at the lowest y coordinate of the polygon + 1) , for each edge of P, check if the ray intersect with that edge. Each such operation takes O(1) and the total takes O(N).  Here we need to handle the special case where the ray across some vertices of P.  Assume i1, i2, ... , ik are the points at which the ray intersect the boundary, if any such point is not a vertex of P, then we define the number of times that the ray across the boundary of P as k. If any such point is a vertex of P, we define the number of times that the ray across the boundary of as 1. For the rest, we  define the number of times that the ray across the boundary of as num_of_non_vetex_points.

Find the Greatest Common Divisor (GCD)

Problem: Given two non-negative integer a and b, find the greatest common divisor.

Solution: A very efficient algorithm is Euclid’s algorithm.  The main idea here is GCD(a, b) =  GCD(ba mod b). Therefore, the code is straightforward:

int GCD(int a, int b)
{
   if(b==0) return a;
  
   return GCD(b, a mod b);
}

Tuesday, February 28, 2012

The Diameter of a Binary Tree

Problem: Given a binary tree, there is a path connecting any two nodes u,v in the tree. Among all the possible paths, the longest path is considered as the diameter of the binary tree. Find out this path.

Solution: We can use recursion to solve this problem. We need to be careful than this longest path can just reside in one of the sub-tree of the root. The code for this algorithm is as follow:

int height(TreeNode * root)
{
   if(!root) return 0;
   
   return 1+max(height(root->left), height(root->right));


int Diameter(TreeNode *root)
{
   if(!root) return 0;

   int h_left = height(root->left);
   int h_right = height(root->right);
   
   return max(1+h_left+h_right, max(Diameter(root->left), Diameter(root->right)));

}

Memorization might be used to avoid repeated calculation.

Monday, February 27, 2012

An O(N^2) Algorithm for Optimal BST

Problem: Given a set of ordered keys and their corresponding probabilities to be visited, construct an optimal BST tree with minimum search cost.

Solution: This is the classical optimal BST problem. The well-known solution has an O(N^3) complexity. However, Knuth found a better way that can achieve O(N^2). Basically, when we try to find the root of an optimal BST, previously we need to iterate from i to j. But there is an observation: root[i, j-1] <= root[i,j] <= root[i+1, j]. Given this inequality, we can reduce the range which we look for the root. This new range (root[ij-1]], root[i+1, j]) is smaller than (i, j). The detail solution and proof can be found here.

Sunday, February 26, 2012

Construct All Possible Binary Trees with N Nodes

Problem: Enumerate all possible binary trees with n nodes. For example, if n=1, there just one tree; if n=2, there are two trees:

         *             or            *
           \                          /
            *                      *

Solution: The key is to understand what can uniquely denote a tree (or serialize a tree). A good way to serialize a tree is to record the pre-order traversal of a binary tree plus the nil virtual leaf information. If we denote an actual node as "1", the nil virtual leaf as "0". The sequence "10100" will stand for the following tree:


                  *          
               /      \                    
             nil       *
                      /    \
                   nil    nil

The way to decide the nil leaf is to check the children of a node. If a node is a leaf in the tree, it will have two nil virtual leaves, since a leaf won't have any children.  Similarly, we can know the sequence "11000" will stand for the following tree:


                  *          
                 /  \                
               *    nil
              /  \
           nil  nil

Therefore,  our target is to enumerate all possible sequences. There are several constraints on the sequences:

  • For a n node tree, the length of the sequence will be 2n+1 with n "1" and n+1 "0".
  • For any position i in the sequence s and != 2n,  the number of "1" should always be no smaller than the number of "0" in the sub-sequence s[0, i].
The related code is as follow:
// rebuild the tree from a sequence such as "11000"
TreeNode * rebuild_tree(int s[], int n)
{

    TreeNode *root = create_node();
    stack<TreeNode *> stk;
    stk.push(root);
    
    for(int i=1; i<s.size(); i++)
    {
       if(s[i])
       {
         
         TreeNode *node = create_node();
         if(s[i-1]) 
         {
           stk.top()->left = node;           
         }
         else
         {
           stk.top()->right = node;
           stk.pop();          
         }
         stk.push(node);
       }
       else 
       {
         if(!s[i-1]) stk.pop();
       }
    
    }
    
    return root;

}

//print all possible trees
void output_all_possible_trees(int *seq, int n, int num1, int num0)
{
     
     if((num1 + num0) == 2*n)
     {
        seq[2*n] = 0;
        TreeNode *root = rebuild_tree(seq, 2*n+1);
        print_tree(root);
        return;
     }
        
    if(num1 >= num0 && num1 < n)
    {
        seq[num1+num0] = 1;
        output_all_possible_trees(seq, n, num1+1, num0); 
    }       
    
    if(num0 < num1 && num1 <=n)
    {
        seq[num1+num0] = 0;
        output_all_possible_trees(seq, n, num1, num0+1);  
    
    }
   
}

Iterative Binary Tree Traversal with Constant Space Overhead

Problem: Given a binary tree, try to traverse the tree in an iterative way. The space overhead is expected to be constant.

Solution: With stack, we can iteratively traverse the tree. Here explain a way to traverse the tree without stack. The main idea is to use threaded tree. The key is to leverage the right child pointer of a node that is null. Then set this null-valued right child pointer to the in-order successor of the node. The detail of the algorithm is as follow:

  • Inspect a node n, if the node doesn't have a left child, visit this node, then proceed to its right child.
  • If the node has left child, there are two situations. The first is that the predecessor of n has not been processed. Then we need to find a right_most node that will be the predecessor of set right_most's right child to be n. If n's left child has a right child, right_most is left child's right-most child; if n's left child doesn't have a right child, right_most is left child itself. Then we proceed to  n's left child.
  • The second situation is that the predecessor of n has been processed. We can tell this by keep following n's left child's right child pointer until a right child pointer points to n or the pointer is null. If the pointer points to n, it means the predecessor of has been processed, then we visit n, proceed to its right child. We can do one extra thing here, before moving forward, we can set the right child of n's predecessor to be null, since previously we modified its value to thread the tree. By doing this, we can keep the tree unmodified after traversal.
  • The algorithm stops when we find the current node is null
The code is below:
void Non_recursive_tree_traversal(TreeNode *root)
{
   if(!root) return;
   
   TreeNode *cur = root;
     
   while(cur)
   {
      // if no left child, visit itself      
      if(!cur->left)
      {
        visit(cur);
        cur = cur->right;        
      }
      // left child is there
      else 
      {
        TreeNode *pp = cur->left;
        
        while(pp->right)
        {
           if(pp->right == cur) break;      
           pp = pp->right;     
        }
        
        // if cur's predecessor has been visited 
        if(pp->right == cur)
        {
           visit(cur);
           // revert the right pointer to null
           pp->right = NULL;
           cur = cur->right;
        }        
        else if(!pp->right)
        {
           // if the right child pointer is null
           // set its value to thread the tree
           pp->right = cur;
           cur = cur->left; 
        }
 
      }     
   }
}

Saturday, February 25, 2012

Get the Max and Min Simultaneously

Problem: Given an array of integers, try to get the max and min in this array with the least number of comparisons.

Solutions:  if we have a current_max and current_min, for each element except the first one, we need to compare it against current_max and current_min. Thus in total, we need 2n-2 comparisons (or 2n-3 if you do it slightly different). However, a better approach exists by inspecting the integers in pair. We first compare the two adjacent integers, and use the bigger one from the two to compare against current_max and use the smaller one from the two to compare against current_min. For example, if A[i] = 4, A[i+1] = 10 and current_max = 11 and current_min = 2,  we just need to compare A[i] against current_min and A[i] against current_max. Therefore, for every two integers, we need 3 comparisons, which leads to a total comparisons around 3n/2.

Something about Heapsort

Heapsort has two pleasant properties

  1. an average and worst-case O(N*LogN) complexity
  2. in place algorithm, no extra space is needed.

Heapsort has two steps:

  1. first make a max heap (or min heap), this operation takes O(N)
  2. pop the root of the heap, swap it with the last element in the heap (the last leaf), do heapify on the elements rangeing from 1 to N-1. Basically, we build a new max (min) heap on the elements rangeing from 1 to N-1
  3. repeat the step 2

Strassen’s Method

Strassen’s method is a good  algorithm used in matrix multiplication. Naive approach usually takes O(N^3) to do matrix multiplication, while Strassen’s method costs O(N^2.8). Basically its recursion formula is like T(N) = 7*T(N/2) + N^2.

Friday, February 24, 2012

Get the Maximum Profit from Stock -- Maximum Subarray Problem

Problem: Given an array price[], price[i] denotes the stock price at day i. Assume the array has a length of N (from day 1 to day N), try to get the maximum profit by selecting optimal timing to bug and sell stocks. For example, if price[] = {5, 2, 4, 3, 1},  the maximum profit per share we can get is $2. What we need to do is buy at day 2 and sell at day 3.

Solution: Naive approach takes O(N^2) by checking any two days. An improved approach costs O(NlogN) by first finding a descending sequence from the beginning. However, optimal approach should only cost O(N). The key is to do some transformation on the original problem.  Let us still reuse the previous example, given price[] = {5, 2, 4, 3, 1}, we can get another array diff[] = {-3, 2, -1, -2} such that diff[i] = price[i] - price[i-1] .  Then other goal is just to find the maximum subarray in diff[].  As known, there is a O(N) solution for the  maximum subarray problem.

More: An alternative that may be conceptually even simpler exist. Just do a linear scan, we keep a current_min, then for every day, we calculate the gain as the difference between the price at that day and current_min,if the gain is more than the maximum gain that we are tracking, update the maximum.
This algorithm also takes O(N).

Friday, February 10, 2012

N Disks and K Pegs -- General Problem of Hanoi Tower.

Problem:  Most people must be aware of the Tower of Hanoi problem that has N disks and 3 pegs. Basic Hanoi problem can be solved either by iterative or recursive approach. However, here we discuss a more general problem. Given N disks and K pegs, also given a start state (e.g., the peg that a particular disc is on) and an end state, try to find the optimal solution (i.e., minimum moves) to move the disks such that we can go from the start state to the end state. Here we still need to follow the similar rule in Hanoi: larger disk should be below smaller disk.

For example, assume we have 4 pegs, and the start state is: pegs[1] = [3,2] (which means disk 3 and disk 2 are on peg 1), pegs[2] = 4, pegs[3] = 1, pegs[4] = []; the end state is pegs[1] = [1], pegs[2] = [3], pegs[3] = [],  pegs[4] = [4,2]. The optimal solution is 2->4, 1-> 4, 1->2, 3->1 ( here "2->4" means move the top of peg 2 to peg 4), totally 4 moves.

Solution: if the N and K are not too large, we can use DFS to solve this problem. Basically, the total space will be K^N, but we are able to prune out some unfeasible paths.

  • Basically, we try all the possible moves in a DFS way. If we reach the end state, we stop and return. Meanwhile, we need to remember how many moves it takes to the end and store it in current_min.
  • If from the start state, we have already made current_min moves, we don't need to go further, since we can't get a result better than  current_min.
  • We need to have a hash table to remember the state we have seen before. If we found we revisited a state, try to compare the moves that we have made now and the previous number of moves to reach that state. If now we take less move to revisit the state, we go forward. Otherwise, we can just return.
The pseudo code is as follow:

// the counter for the number of moves from the 
// start state 
int cnt = 0
// the current minimum moves to the end state
int current_min = INT_MAX
// a hash table to remember the visited states
hash_table ht;

void dfs()
{
  if(cnt >= current_min) return;
  // get the current state of disks   
  string stat = get_state()
 
  // if stat is not in hashtable
  if(!ht.has_key(stat))
  {
    ht[stat] = cnt;  
  }
  else
  {
     // if we take more moves to 
     // visit this state, we just
     // return
     if(ht[stat] <= cnt) return;
     ht[stat] = cnt;
  }   

  // if we reach the end state
  if(stat == end_state)
  {
     if(current_min > cnt)
     current_min = cnt;
     return;
  }
  
  for(int i=0; i<K; i++)
  {
     current_disk = peg[i].top();
     // iterate all the possible moves
     // current_disk can make
     for move in possible_move_set
     {
        make the move;
        cnt++;
        dfs();
        rewind the move;
        cnt--;
     }
  }

}

Wednesday, February 8, 2012

String Permutation without Duplication

Problem: This is a classic problem. Given a string "abc",  you need to output all the permutations. They are "abc", "acb", "bac", "bca", "cab" and "cba".  While for input that contains duplicate characters, you need to output all unique permutations. For example, input is "abb", then output should be "abb", "bab" and "bba".

Solution: For strings without duplicate characters, the solution is not difficult. To avoid duplication, definitely you can use a hashtable to memorize the permutations that you had outputted.  Considering that hashtable may cost a lot of space, some other solutions are desired. However, many solutions on the web actually are incorrect. Here gives a correct solution by using a bit vector (or counting array).

  • The key thing is that if now I am gonna swap a character c to position i, I must guarantee previously there is no c that has been swapped to  position i. This is the important constraint to avoid duplicate.
  • Then we can use an array to track this. If the string only contains characters that range from 'a' to 'z', then we only need an array of the length 26. The code is as follow:

void permutation(string &s, int idx)
{

     if(idx == s.size())
    {
        cout << s << endl;
        return;
    }

    char a[26] = {0};

    for(int i = idx; i < s.size(); i++)
    {
        
        if(a[s[i] - 'a'] == 1) continue;

        a[s[i] - 'a'] = 1;

        swap(s[idx], s[i]);
        permutation(s, idx+1);
        swap(s[idx], s[i]);
   }
}

Monday, February 6, 2012

Efficient Way to Calculate Fibonacci Number

Problem: To give a O(logN) solution to Fibonacci function.

Solution: Solving Fibonacci with recursive function is easy. It is also not difficult to have an iterative solution with O(N) time complexity. Then can we do it even better? A solution based on matrix multiplication can help us achieve that.

  • Basically we have a 2*2 matrix A = {[1,1], [1,0]} ({[first row], [second row]}). we can see that a[0][0] =1 = fibo(2). Then A*A = {[2,1],[1,1]} and a[0][0] = fibo(3) = 2. Basically we will have A^n = {[fibo(n+1), fibo(n)], [fibo(n), fibo(n-1)]}. 
  • Then calculating Fibonacci number basically boils down to calculate the power of A. You may think this still takes O(N). However, we can calculate power efficiently (see this link) such that only O(logN) is needed.
More to mention: Actually there is a formula that generates Fibonacci numbers based on the golden ratio. However it evolves floating operations which may not as efficient as the matrix solution.

Find the Maximums in a Sliding Window

Problem: Given an array of integers and a window that has a size w, slide the window from the left side of the array to the right side, report the maximums in the sliding window during the sliding. For example, a[] = {3,1,5,2,1,4,-3, -2 , 6} and the w = 3. Then at the very beginning, we have [3,1,5] in the sliding window and then the maximum in the window is 5. If we slide the window one step towards the right, we have [1,5,2] in the sliding window and the maximum in the window is still 5. You need to get all these maximums. For this example, the maximum array is {5,5,5,4,4,4,6}.

Solution:  It is easy to know naive approaches will take O(wN), since after each sliding operation, we need O(w) time to find the maximum. To improve, you can use skip list. Basically, you have a sorted linked list to represent the integers in the window. Then you build skip list on top of it. For every slide operation, we need to do an insertion and a deletion. Therefore, the complexity will be O(Nlogw). However, a smarter solution from www.leetcode.com can achieve O(N). The details are as follow:

  • Have a double-ended queue (which you can both manipulate the head and the tail. For initialization, add the indices of the first integers to the queue. While adding indices, we must guarantee the integers to which those indices in the queue point keep a descending order. Therefore, if the index to be added points to an integer that is bigger than some integers to which some indices already in the queue point, we need to remove all "those" indices first.   For the example we showed previously, a[] = {3,1,5,2,1,4,-3, -2 , 6},  after adding the first integer, the state of the queue will be [0], which means a[0] is in the queue; After adding the second integer,   the state of the queue will be [0,1], which means a[0], a[1] are in the queue. Since a[0] > a[1], this guarantee a valid descending order. However, when adding the third integer, we need to pop put both "0" and "1", since a[2] is bigger than either a[0] or a[1]. Then after adding the third integer,   the state of the queue will be [2], which means only a[2] is in the queue.
  • Since it is a sliding window, we also need to retire indices that point to the integers which are out of the window. We can do this by checking the indices in the queue and the current location of the window. If some index is out of bound, we need to remove them.
  • The size of the queue will always not be bigger than w. For every integer in a[], it will be inserted in the queue and deleted from the queue for at most once. Therefore, the complexity is O(N). 
 You can find the code for this algorithm in  www.leetcode.com.

Thursday, February 2, 2012

Find the Unique Element that Appears k Times in an Array

Problem: Let's first come with some special cases of this problem. Then we can generalize. Given an integer array, only one integer appears 2 times, all the other integers appears 3 times in the array. Find out that integer. For example, for a[] = [2,3,2,4,4,2,3,4], the output should be "3".

Solution: Definitely with hash table or sorting, the problem can be easily solved. What if those approaches are not allowed? Fortunately, we can still use bit operation to solve this problem.

  • We need some counters to store the bits that just appear once, the bits that appear twice.
  • If some bit appear three times, we need to reset those counters.
  • Then at the end, the counter that stores the bits that appear twice will be the answer
Sample code is as follow:
def foo(list):
        # three counters which are used to store
        # the bits that appear 1 time, 2 times and 
        # three times, respectively
        ones = 0
        twos = 0
        threes = 0

        for x in list:
                # which bits had appeared for 3 times?
                # and later we need to reset those
                # bits if they also appear in the 
                # counter "ones" and "twos".
                threes = twos & x
                twos ^= (ones & x)
                twos &= ~threes
                ones ^= x
                ones &= (~threes & ~twos)

        # here we just need to return the counter 
        # "twos". If we need the unique element
        # that appears once (assuming the rest
        # all appear 3 times, just return "ones"
        # instead. 
        return twos


A more general problem: What if we change the problem. Now the problem is given an integer array, only one integer appears k times, all the other integers appears m times in the array (k<m). Find out that integer.

Solution: We can use similar strategy (as shown in above code snippet) to solve the problem.

  • Basically, we need m counters (or a counter array). 
  • We have counters[m] = counter[m-1] & x.
  • counter[i] ^=  counter[i-1] & x and  counter[i] &= (~ counters[m] & ~counters[m-1] & ... &  ~counters[i+1] )

Tuesday, January 24, 2012

Find the Largest Container in a Histogram

Problem: Given a histogram, pick two bars in the histogram as the left and right sides of a container. X axis is consider as the bottom. Then we have a container which can hold water. Find the container that can hold the largest volume of water. The container must be placed horizontally (you can't rotate container).

Solution: We can still use DP to solve this problem in O(n). Basically, there are two factors decides the volume of the container: 1) the minimum of the two sides 2) how wide the container is. If the highest bars are just at the left and right end of the histogram, we know that by picking these two bars we can have the largest container. If the bars at the left and right end are not the two highest, we need to explore further to look at other combinations. Based on these observations, what we need is actually an increasing sequence from left to right and  also an increasing sequence from left to right. The details of the algorithm is as follow:

  • First model the histogram as an array h[i]. For example, h[] = {3, 1, 4, 7, 5, 2, 6}. 
  • Then find the two increasing sequences. The one from left to right is left[] = {3, 4, 7} and the one from right to left is right[] = {6, 7}.
  • begin with the first elements in left[] and right[], calculate the volume of the container formed by these two bars, use max_v to store this volume (which is 3*5=15). 
  • Then select the smaller one of the two bars, make one advancement in the array which the smaller bar is from. For our example, between left[0] and right[0], left[0] is the smaller one. Therefore we advance to left[1], which is "4". Then we calculate the volume of the container formed by left[1] and right[0]. The volume is 4*3=12, which is smaller than the previous value 15, so we just keep the old value. 
  • Then we repeat the previous step, just advance the array which the smaller bar is from. We will get   left[2] and right[0]. The corresponding volume is 6*2 = 12, which is still smaller than 15.
  • Then we need to advance in right[], we will find right[1] and left[2] refer to the same bar. Since at this time, both left[] and right[] have been exhausted, our algorithm just abort.
  • One more thing, when two bars  left[i]  == right[j], we need to advance both arrays and inspect   left[i+1] and right[j+1] next.

Monday, January 23, 2012

Make Best Use of the Conference Room

Problem: Given a conference room and a number of presentations with start and end time ( e.g., [4, 9], [5, 10]), try to make an arrangement which allows the conference room to be used for maximum time. Overlapping presentations can't be in the same arrangement.

Solution: We can have a O(nlogn) solution by using DP.  The details are as follows:

  • Sort the presentations by their end time. Thus we will have a sorted array end[N].   N is the number of the presentations.
  • Have an array Max_arr[N].  Max_arr[i] stands for if we take end[i] as the close time for the conference room, the maximum time the room can be used. It is easy to know that  Max_arr[0] = the duration of the presentation that ends at  end[0]. We need to find out Max_arr[N-1].
  • To calculate Max_arr[i], we first get the start time si of the presentation that ends at end[i]. Then we do binary search in end[0] ... end[i-1] for si. Basically we need to find a j such that  end[j] < si &&  end[j+1] > si. Therefore,  Max_arr[i] = max(Max_arr[i-1],  Max_arr[j] + end[i] -  si ).

Zeckendorf's Theorem -- How to Represent a Positive Integer with Fibonacci Numbers?

Zeckendorf's Theorem is about that every positive integer can be represented in the form of the sum of one or more Fibonacci numbers.

For example, 6 = 3 + 2 + 1, 10 = 8 + 2. We can further represent these positive integers in the binary form of Fibonacci numbers. Thus, 6 can be represented as "111" which means fibo(1) +  fibo(2) + fibo(3) -- the sum of the first, second and third Fibonacci numbers. Similarly, 10 can be represented as "10010" which means fibo(2) +  fibo(5)  --  the sum of the fsecond and fifth Fibonacci numbers.

How to find the Fibonacci numbers that sum to a particular positive integer? Basically you can use DP and leverage the binary form.

  • 1 can be represented as "1", 2 as "10", 3 as "11", 4 as "101", 5 as "110". Thus, we can observe the binary form of a positive integer i is just the immediate next number of the binary form of a i-1. However, this "immediate next" is slightly different when an integer has a binary form of all "1". For example, 3 is "11". The  "immediate next" of "11" is not "100" but "101". This is due to the nature of Fibonacci numbers. We know "100" = "11" since fibo(n) = fibo(n-1) + fibo(n-2). So here we need to add one more "1" to "100".
  • Based on the above analysis, we can get those Fibonacci numbers for a particular positive integer by using DP.
Actually not only positive integers, negative integers can also be represented in the sum of "broadly defined" Fibonacci numbers.

    Thursday, January 19, 2012

    Total Area Covered by a Set of Overlapping Segments

    Problem: Given a set of overlapping segments, output the total length they covered. For example, for segments [1,3], [2,5], [6,7], the total length is 5.

    Solution: Sweeping line is the right solution for this type of problem. Also the algorithm to this problem is the building block to solve the problem of total area covered by as set of overlapping rectangles. The following gives the detail of the algorithm:

    • First, sort the points (start and end) of all the segments. 
    • Have a event queue and a counter cnt.   cnt is used to store the number of  active segments. When encountering a point p, first look at cnt, if  cnt > 0, there must be a previous point p' (  p' is the point in the queue that is immediately before  p) in the queue, add  p'-p to the total length covered.  Then add the current point p into the queue. If it is a start point, increase cnt by 1.   If it is an end point, decrease cnt by 1.  
    • After iterating all the end points, we get the total length covered. We can also choose not to use cnt. Then we need to watch the size of the queue. Besides, when encountering an end point, we need also to remove some corresponding points in the queue, which is more complicated.

    Longest Subarray with Equal "1" and "0"

    Problem: Given an array that only contains "1" and "0", find the longest subarray which contains equal number of "1" and "0".

    Solution: With hash table, we can have a O(N) solution. The detail is as follow:

    • First convert all "0" to "-1", then calculate c[i] = sum(a[0], ... , a[i]). It takes O(N) to calculate all the c[i].
    • Then our task is to find a c[i] and a c[j] such that  c[i] = c[j]  and |j-i| is maximum. With a hash table, we can finish this job by doing a linear scan with a time complexity of   O(N).
    • There is a special case you need to handle. When c[N-1] = 0 (assume N is the size of a), the longest subarray is just a itself.

    Implement Text Editor with Gap Buffer

    Gap Buffer is a data structure which can be used to implement text editor. The advantage is that insertion/deletion can be very efficient and the data structure is simple. The disadvantage is when cursor changes its location frequently or the gap is frequently full, there will be a lot of copying operation, which is costly. The following gives more details about gap buffer.

    • Basically gap buffer can be implemented as an array (or a dynamic array) with some pointers to differentiate three regions: the segment before the gap, the gap, the segment after the gap.
    | the front segment  |   the gap  |  the back segment  |
    • The location of the cursor in the text editor decides the border between the front segment and the gap. A example is given here:
    We had a [            ] big day
    • As the above example, "We had a" is the front segment, "[   ]" is the gap buffer and "big day" is in the back segment. 
    • When insertion, new text is filled in the gap buffer, the start pointer of the gap buffer also moves accordingly. If the gap is full, we need to create new gap buffer and we may also need to move all the content in the back segment backwards.
    • When deletion,  we only need to move the start pointer of the gap buffer forwards if we are deleting the text in the front segment or  the end pointer of the gap buffer backwards if we are deleting the text in the back segment.
    • If we move the cursor, for example, the cursor is between "We" and "had" now:  "We [     ] had a big day". We need to copy the "had a" which was originally in the front segment to the back segment.

    Wednesday, January 18, 2012

    Rabin–Karp Algorithm

    Rabin–Karp algorithm is yet another algorithm to do pattern search in a string (also see KMP algorithm). The problem is given a target string s and some pattern p (p is also a string), how to efficiently decide if s contains p as substring.

    The key of Rabin-Karp algorithm is to use hash, more specifically, rolling hash. Then with hash, when checking the match of a substring of  s  to the pattern p, we don't need to compare each character of s and  p. Instead, we just compare hash(substring of s) and hash(p). If the hash values are not the same, the substring doesn't match with the pattern. If same, we still need to further inspect every character of  the substring and the pattern, since hash function can have collisions. However, this further check is not expected to be often.

    Then the problem is how to efficiently calculate the hash of all the substrings of  s with a specific length. The rolling hash just kicks in here. One simple example of rolling hash is to use the sum of all the ASCII value of the characters in the substring as the hash value, e.g., s[i] + s[i+1] + ... + s[i+m-1] for the substring s[i... i+m-1]. Then for the substring s[i+1... i+m], we can reuse the hash value we just calculated for the substring s[i... i+m-1]. Assume it is h, then the hash value for substring s[i+1... i+m] is just h - s[i] + s[i+m]. In practice, more sophisticated rolling hash function is used. For example, using some large prime, the hash of "hi" can be ascii("h")*101^1 + ascii("i")*101^0

    If there is just one pattern, Rabin–Karp algorithm is not as good as KMP algorithm. Rabin–Karp algorithm is more useful in multi-pattern match. It is used together with a bloom filter. A bloom filter can be created to represent multi-pattern. Then if a substring has hash value hit in the bloom filter,  it is probably there is a match.




    Dynamic Programing Solution to Subset Sum Problem

    Problem: Given a set of integers, find a subset of them which sum to a target value s.

    Solution: We can definitely first sort the set, then use some recursive approach to solve the problem. The complexity is expected to be exponential. Here we give a DP approach, though the complexity is also not polynomial.

    • First we need to define the state. We have a state function Q(i,k) which means if we can find a subset from the set of integers a1,a2, ..., to ai which sum to k. So Q(i,k) is a Boolean function.
    • We have  Q(1,k) = ( a1 == k);  Q(i,k) =  Q(i-1,k)  or  ( ai == k) or  Q(i-1,k-ai).
    • Then we just need to fill in the matrix of  Q(i,k). We know 0< i <= the size of the set. For k, we have   N=< k <=P, where N is the sum of all the negative integers in the set and P is the sum of all the positive integers in the set.

    Tuesday, January 10, 2012

    Find K Elements with Maximum Minimum Consecutive Difference

    Problem: Given a sorted array A, find k elements from the array with the maximized minimum consecutive difference. For example, m1, ... , mk are selected elements. We want to maximize min(mi+1-mi). A more concrete example, a sorted array A = [1, 3, 4, 8] and k =3. There are four ways of picking 3 elements: {1,3,4}, {1, 4, 8}, {1, 3, 8}, {3, 4, 8}. The minimum consecutive difference for these four selections are 2, 3, 2, 1, respectively. The maximum of these minimum consecutive differences is 3, which is from selection {1, 4, 8}.


    Solution: [*will re-edit later*] This is problem is similar to given a line, try to divide the line into several segments as even as possible. It is also similar to the "Painter and Slates" problem (see a link here). The key here is to use binary search, which will give you O(nlog(A[n-1] - A[0])). Using DP definitely is OK, but the complexity is O(k*n^2).
    • Basically, the selection depends on the how "wide" the gap we use. If we use the gap as wide as A[n-1] - A[0] (n is the size of array), we can only select 2 elements. On the other way, if we choose the gap as min(A[i+1]-A[i]), we will select n elements. Our goal is to select k elements while with maximized minimum difference. k is within 2 and  n. Then a binary search can help us find k.
    • We just do binary search on the space of the width of the gap. We set low =  min(A[i+1]-A[i]), high =  A[n-1] - A[0] and make mid = (low + high) /2. 
    • With this mid as the width of the gap, we can calculate the number of elements we selected. If it is larger than k, we need to increase the width of the gap: we just update low = mid; Otherwise, we need to decrease the width of the gap by update high = mid.
    • We stop when we find a gap such that if we further increase the gap, we can't be able to select k elements.
    • Try to calculate m[i][j][k], where i and j are the index within A[] and k how many elements are selected with in the region A[i] ~ A[j]. The time to calculate this array is O(N^4).