Friday, March 23, 2012

Graceful Way to Shift an Array

Problem: We have an array of integers (or other type), given an integer k,  we want to shift the array elements by k. For example, if a[] = {1, 2, 3, 4, 5} and k = 2, the resulting array will be a[] = {4, 5, 1, 2, 3} .

Solution: This is an easy problem but a graceful solution is not that straightforward. When I mean "graceful", I mean a solution of O(n) time complexity and O(1) space complexity. The trick is try to reverse the array twice. The detail is as follow:

  • First, reverse the whole array. If a[] = {1, 2, 3, 4, 5}, after reversing, we have a[] = {5,4,3,2,1}.
  • Then reverse the subarray from idx 0 to idx k-1, with the previous example we have a[] = {4,5,3,2,1}. Then reverse the subarray from idx k to idx n-1, after this, we get a[] = {4, 5, 1, 2, 3}.

Thursday, March 22, 2012

about Josephus problem

Problem: Josephus problem is a classical problem which can be solved by DP. The problem assume n persons in a circle, then start from the 1st person, we eliminate a person after skipping a fix number of persons. The process stops when there is only one person left. We want to know the last person's index in the original circle. For example, if n = 5 and k = 2, at the beginning, we have 1 2 3 4 5, then we eliminate 2 and 4, then we eliminate 1 and 5, so the last person's index is 3.

Solution: For the general case where k is an arbitrary number, we use F(n, k) to denote the solution for n persons. Then we have F(nk) = (F(n-1, k) + k ) mod n. Why? we can think the problem as follow:

  • The first person we need to eliminate is the kth person, after he is removed, we have n-1 persons now.
  • However, we can not directly resort to the subproblem F(n-1, k), since now our process will start at the k+1th person at the original circle. Therefore we need to remap the indices. Since the process is applied on a circle, we can move the first k-1 persons (1st, 2rd, ... k-1th person at the original circle) to the end of the circle. Thus, assume the new index in F(n-1, k) is i, the corresponding index in F(nk) is (i + k) mod n.
The above approach will take O(N) time. For smaller k, actually O(logN) solution exists. For example, when k = 2, we have F(2n) = 2*F(n) -1 and F(2n+1) = 2*F(n) +1. Actually for = 2, even an O(1) solution exists: if n = 2^m + l where  0 =< l  < 2^m, the last person will be 2*l + 1.

Monday, March 12, 2012

Calculate the Number of "1" in the Binary Form of a Positive Integer

Problem: Given a positive integer, count the number of "1" in its binary form. For example, the binary form of "7" is "111", so the number of "1" is 3.

Solution: There is an elegant solution with a complexity of O(k) where k is the number of "1". The code is as follow:

int cout_numof1(int n)
{
 
   int cnt = 0;

   while(n)
   {
      n = n & (n-1);
      cnt++;
   }
  
   return cnt;
}

Sunday, March 11, 2012

Calculate log2() Using sqrt()

Problem: Calculate log2()  by only using sqrt().

Solution: The key here is you need to know log2(1+x) = x/ln2 = x/0.69 when x is very close to 1. Therefore the algorithm first use sqrt() to reduce x to be very close to 1 then we can use the formular.
double log2(double x)
{
   assert(x>0);

   double res = 1.0;
   while(x-1 > threshold)
   {
      x = sqrt(x);
      res = res *2;

   }

   return res * x/0.69;

}

Find the Minimum Set of Numbers That Sum to at least K.

Problem: Given an array of integers, find a smallest subset (with minimum numbers of integers) that sum to at least K. For example, a[] = {2,-5,7,8,12, 4} and K = 13, then one of the minimum set can be {8, 12}.

Solution: One solution is first sort the array, then begin from the max integer, calculate the cumulative sum until the sum is no less than K. This solution (depending on what kind of sort algorithms you use) usually cost O(nlogn). Here we introduce a selection based algorithm which on average costs O(n) but costs O(n^2) in the worst case. The approach is similar to kth order-statistics.

  • Randomly choose a pivot from the current array and partition the array into two parts p1 and p2p1 contains integers which are smaller than the pivot; p2 contains integers which are larger than or equal to the pivot.
  • Check if pivot + sum(p2) == K, if so, we already find the answer; if not, then if pivot + sum(p2) > K but sum(p2) <= K, we also find the answer; if  sum(p2) > K, we need to do further partition, but we just need to further partition p2;  if pivot + sum(p2) < K,  we need to do further partition, but we need to partition p1 instead and the target value K is updated as K - pivot - sum(p2).
The code is as follow:
void find_min_set(int *a, int start, int end, int target)
{

    /* generate secret number: */
    int s, e, k;

    s = start;
    e = end;
    k = target;

    int sum;

    while(1)
    {
       srand ( time(NULL) );
       int idx = ((int)rand()) % (e-s + 1) + s;
       
       /* partition and sum */
       int tmp = a[idx];
       a[idx] = a[e];
       a[e] = tmp;

       sum = a[e];
       int pos = s;
       for(int i = s; i < e; i++)
       {
          if(a[i] < a[e])
          {
             if(pos != i)
             {
                tmp = a[pos];
                a[pos] = a[i];
                a[i] = tmp;
             }
            pos++;
          }
          else sum+= a[i];

       }

      tmp = a[pos];
      a[pos] = a[e];
      a[e] = tmp;

      if(sum == k)
      {
        cout << "found " << endl;
        return;
      }
      else if(sum > k)
      {
         if(sum - a[pos] < k)
         {
            cout << "found " << endl;
            return;
         }
         s = pos + 1;
         sum = 0;

      }
      else
      {
        k = k - sum;
        sum = 0;
        e = pos - 1;
      }
    }

}

                   

Wednesday, March 7, 2012

Calculate Number of Different Ways to Climb a Stairway with Constant Space and Linear Time

Problem: There are three ways to climb a stair, you can climb 1 step, 2 steps or 3 steps each time. Given a n step stairway, how many of ways can you climb up to the top?

Solution: As known, this is a simple DP problem.  If we can maintain an array dp[] with dp[i] representing the different ways to climb up i steps, we can get the final result in O(N). However, we need O(N) space as well. But actually, since dp[i] = dp[i-1] + dp[i-2] + dp[i-3], we don't need O(N) space, instead, we just need a cache with size 3. Then we can have a space complexity O(1) algorithm:

int climb_stair(int n)
{
  int cache[3];
  cache[0] = 1; /*ways to climb 0 step */
  cache[1] = 1; /*ways to climb 1 step */
  cach2[2] = 2; /*ways to climb 2 steps */

  if(n<=2) return cache[n];

  for(int i=3; i<=n; i++)
  {
     cache[i%3] = cache[0]+cache[1]+cache[2];
  }

  return cache[i%3];

}

Similarly, we can have a O(1) space algorithm for Fibonacci numbers.

User Bin Crash problem

Problem: This is a problem from facebook buzz. The problem is that a plane is gong to crash,  you need to throw out W lbs of freight to survive the crash. Each type of freight fi is associated with a weight wi and price pi. If we throw one unit of fi out, we lost pi but the plane becomes lighter by wi. Our mission is to throw out minmum freights measured by their aggregate price and avoid the crash, which means the total weight of those freights should be larger than W.

Solution: The problem is very similar to the 0-1 knapsack. We can also use similar strategy to solve it. The detail is as follow:

  • Basically, we need first sort different types of freight by their weight. 
  • Then we will need to fill an array dp[n][W+1] where n is the number of different types of freights. dp[i][j] represents the minmum cost to throw out at least j lbs freight and we can throw up to ith type of freights (ranked by their unit weight and from light to heavy). 
  • To fill dp[i][j], we need first select the most cost/efficient type of freight among the i types of freights. Cost/efficient is measured by the unit weight of the freight divided by its unit price. Say the most cost/efficient type of freight among the i types of freights is kth type of freights, we have if j <= kth type's unit weight, dp[i][j] = min(dp[i-1][j], kth type's unit price); Otherwise,  dp[i][j] = min(dp[i-1][j], dp[i][j-kth type's unit weight] + kth type's unit price). We don't need to use greedy method, just do regular dp. dp[i][j] = min(dp[i-1][j], dp[i][j-ith type's unit weight] + ith type's unit price).
  • The time complexity of this approach is O(nW).

About Disjoint-Set Forests

Disjoint-Set Forests is very useful to solve the problems such as connected component in a graph. For example,  we have several set {a, b, c}, {d, e}, {f}. The elements in set is vertices that connected to each other. If we have an edge e(b,f), then we can merge the set {a,b,c} and {f} since now they are connected.

The common operations in disjoint-set are Make-Set, Union and Find-Set. Here Union is the operation that merge two sets and Find-set is that given a element, find out which set this element is in. To model the disjoint-set problem. We can use a forest such that each tree is a set. The representative of a set is the root of that tree. And each element in the tree has a parent pointer. If we want to do Union, we can just put one tree as the sub tree of another. If we want to do Find-Set, we just start from that element and follow its parent pointer until we reach an element whose parent is itself.

However, to make the operations efficiently, two optimization can be made:

  1. Union by rank: In order to keep Find-Set efficient, we need to keep the depth of the tree small. In order to do that, for each tree, we can maintain a rank which stands for the largest depth of the tree. When we Union two trees, we make the tree with smaller rank as the subtree of the tree with bigger rank. Basically, we made the the parent pointer of the root of the smaller ranked tree to point to the root of the bigger ranked tree.
  2. Path compression: Still we want to further improve Find-Set, the idea is after we do a Find-Set, actually we had a path from root to the query element. Then we can modify all the element in this path to point their parent pointer to the root, which will shorten the path for future query.

Tuesday, March 6, 2012

Convert an Array into a Sorted Array with Minimum Cost

Problem: Given an array of positive integers, and you only have two operations : a) decrease an integer by k  with a cost of k; b) delete an integer with a cost of the value of that integer. Try to use these two operations to convert the array into a sorted array with minmum cost. For example, a[] = {5,4,7,3}, the optimal way is to decrease "5" by 1 and delete "3" with the total cost of 4 (1+3). The resulting sorted array is {4, 4, 7}.

Solution: Here I give a O(N^2) solution by using DP. The key data structure we need is dp[i] which records the minimum cost to covert a subarray a[0, i] into sorted array and the last element of the sorted array should be a[i].  For the example array a[] = {5,4,7,3}, dp[0] = 0, dp[1] = 1 (since we need to decrease "5" by 1), dp[2] = 1, dp[3] = 7. After we had filled up dp[], we need one more data structure: aggr[]. aggr[i] = a[0] + a[1] + ... + a[i]. Then the minmum cost to convert the whole array cost_min = min{dp[i] + aggr[N-1] - aggr[i]}. Here  "aggr[N-1] - aggr[i]" means the cost that we  delete all the elements with a subscript larger than i.

Then the key is to calculate dp[i]. There are several observations here:
  1. if a[i] >=  a[j] (j<i),  we can construct a sorted array ended at a[i] by deleting a[j+1], a[j+2], ... , a[i-1] and then append a[i]. The cost of the resulting sorted array is dp[j] + the cost of deleting a[j+1], a[j+2], ... , a[i-1]. 
  2. if a[i] < a[j], to have a[i] as the last element, we need to decrease a[j] by a[j] - a[i]. Moreover,  we may not stop at a[j] and we need to keep look backward until we found a a[m] such that a[m] <= a[i], then the cost of the resulting sorted array is dp[m+1] + ∑(a[k] - a[i]) where m<k<j
  3. To compute dp[i], we need to look at all the j such that  j<i, depending on a[j] and a[i], we need to calculate the cost based on step 1 or 2. Then the minmum cost we encounter is the value of dp[i].
The code is as follow:
int min_convert_cost(int a[], int n)
{
    int dp[n];
    int aggr[n];

    aggr[0] = a[0];
    for(int i=1; i<n; i++)
        aggr[i] = aggr[i-1] + a[i];

    dp[0] = 0;
    for(int i=1; i<n; i++)
    {
       dp[i] = INT_MAX;
       for(int j=i-1; j>=0; j--)
       {
          int cost_i = 0;

          if(a[i] >= a[j])
          {
             cost_i = dp[j] + aggr[i-1] - aggr[j];
          }
          else
          {
              cost_i += aggr[i-1] - aggr[j];
              while(a[j]>a[i] && j >=0)
              {
                cost_i += a[j]-a[i];
                j--;
              }
              
              cost_i += dp[j+1];

          }
          if(cost_i < dp[i])
               dp[i] = cost_i;

       }
    }

   int min = INT_MAX;
   for(int i=0; i<n; i++)
   {
      int cost_i = dp[i] + aggr[n-1] - aggr[i];
      if(cost_i<min) min = cost_i;
   }

   return min;
}



Monday, March 5, 2012

String Reduction

Problem:  A string that only contains three types of letters :"a", "b", "c". Our goal is to reduce the string. If two different letters are adjacent to each other, we can reduce the two letters into another letter. The rule is like this: if we encounter "ab" (or "ba"), we reduce to "c";   if we encounter "bc" (or "cb"), we reduce to "a";  if we encounter "ca" (or "ac"), we reduce to "b". If two adjacent letters are the same, we can't reduce them. We need to find an optimal way to reduce the string as much as possible.

For example, for string "acbb", if we try to reduce "ac"first, we get "bbb". But, if we reduce "cb" first, we get "aab" and we can further make "aab" => "ac" => "b". The latter approach gives us an optimal reduction.

Solution: Here what really matters is the numbers of letter "a", "b" and "c". Let us name them as num_a, num_b and num_c. If two of them are zero, we can't reduce that string. Otherwise, we can reduce the string into a string that has a length of 1 or 2. If num_anum_b and num_c are all even or odd,  we can reduce to a string with length 2; If not, we can reduce to a string with length 1.

Then how to do the reduction? The detail is as follow:

  1. if the string has a length of 3 and contains one "a", one "b" and one "c", whatever order of the three letter are in, we can only reduce the string into a string with length 2; if the string has a length of 2 and contains two different letters, we can only reduce the string into a string with length 1. Let us regard these two cases as base cases.
  2. for a general case, we have num_a "a" num_b  "b" and num_c  "c". After every reduction, the sum of num_anum_b and num_c will decrease by 1, since we substitue two letters with one letter.  
  3. Then at each round, which two adjacent letters we choose to reduce? We try to find an adjacent pair that contains a letter which has the highest count. For example, if now, we have 3 "a", 4 "b" and 6 "c" in the string, we choose an adjacent pair that contains "c"since num_c = max(num_anum_bnum_c) . Can we find such pair? definitely we can. If there are multiple such pairs, choose a random one. Then after we reduce this pair, num_c--, max(num_anum_bnum_c) may decrease by 1, remain unchanged, or increase by 1. However, since max(num_anum_bnum_c) is up-bounded by num_a + num_b num_c. num_a + num_b num_c is decreasing after every round, then max(num_anum_bnum_c) will also decrease if we look at a long wrong. Therefore, by using our strategy,  max(num_anum_bnum_c) will eventually decrease to 1, which means we are encounter the base cases in step 1.
  4. Then when the string will be reduced to a length of 1 and when to a length of 2? We observe that is num_anum_b and num_c are all even, then after one transformation, they will become all odd; similarly, if there are originally all odd, after one transformation, they will become all even. Then according to the analysis in step 3), we know at the end, the max(num_anum_bnum_c) will eventually decrease to 1. But, they should still be all odd at that time (since "1" is odd). Therefore, at the very end, we will have num_a = 1, num_b = 1 and num_c =1, which will  eventually lead to a string of length 2. It is easy to prove that if num_anum_b and num_c are  not all even or odd, the string will be reduced to length 1.
  5. if num_anum_b and num_c are  not all even or odd, there must be only two cases: a) two of the counters are odd and one is even b) two of the counters are even and one is odd.  For example, if num_b is odd, num_a and num_c are both even. The string actually will eventually be reduced to  "b".
  6. if num_anum_b and num_c are all even or odd, there might be multiple different final forms.

Find the Minimun Vertex Cover for a Tree

Problem: Given a tree, find its minimum vertex cover. Wait, what is a vertex cover? Given a undirected graph G(V,E),  a vertex cover is a subset of V such that for any egde e in E, at least one of e's two endpoints should be in this subset (vertex cover).

Solution: The minimum vertex cover for a general graph is a NP-hard problem. However, for a tree, there is a linear solution. The idea here is to do DFS search plus post-order traversal. If we encounter a leaf node and the edge connecting this leaf node with its parent, we know in order to construct a vertex cover, we must include at least one of the node (the leaf node, or its parent). Here we can use a greedy approach. We can see selecting the leaf doesn't give us any extra benefit, while selecting the parent can give us some benefit, since the parent must be also connected to other nodes. By selecting the parent node, we can further "cover" some extra edges. With this strategy in mind, our algorithm is as follow:

  • we do a DFS search. When a DFS call on a child node returns, we check if the child and the parent are both unselected. If yes, we select the parent node.
  • After all the DFS finishes (we traverse the tree), those selected nodes form the minimum vertex cover. The cost is O(N).
The pseudo code is as follow:
void min_vertex_cover(TreeNode *root)
{
   if(isLeaf(root)) return; 

   for(int i=0; i<root->num_of_children; i++)
   { 
        min_vertex_cover(root->children[i]);

        if(!root->selected && !root->children[i]->selected)
               root->selected = true;
         
   }
}

Friday, March 2, 2012

Find the Diameter of a Convex

Problem: Give a convex that contains n vertices, find its diameter, basically the pair of vertices that have the longest distance.

Solution: This is the essential problem to solve the more general problem: find the farthest pair of points among n points, but now let's focus on this problem. Assume the vertices of the convex are ordered counter-clockwise (clockwise is also OK) and they are labeled as v1, v2, ..., vn, we can use method that mimics "rotating the convex" to solve the problem in O(n).

First we need to introduce the concept of antipodal pair, the diameter must be the distance between one antipodal pair. Therefore, if we found all antipodal pairs, we can get the diameter. Then how to find all the antipodal pairs efficiently? The detail is as follow:

  1. For edge vnv1 (the line that connects vn and v1), we follow the couter-clockwise order to find the vertex that is farthest from vnv1. We name it vk.  
  2. Then we move to edge v1v2, again, we try to find the vertex that is farthest from v1v2. We name it vp. It is easy to see any vertex from vk to vp form an antipodal pair with v1, then we add the pairs (v1, vk), ... , (v1, vp) to our antipodal set.
  3. Then we let vk = vp  and move to edge v2v3. Again try to find the vertex that is farthest from v2v3. This vertex will be the new vp. Repeat this process until we finish processing the edge vn-1vn.
  4. Now we get all the antipodal pairs, just do a linear scan to find the maximum.
When calculating distance, we can used signed area which avoids square root calculation.

More: To solve the farthest pair of points problem, basically we first first the convex hull of these points, then we apply this convex diameter algorithm. The total cost is O(nlogn) which is dominated by the cost of finding convex hull.

Thursday, March 1, 2012

Decide If a Point is Inside a Simple Polygon

Problem: Given a point x,  try to decide if x is inside a simple polygon P. P is denoted by its vertices {p1, p2, ... , pn} .

Solution: There exists an O(N) solution based on one important observation: any ray from x will cross the boundary of P for odd number of times if and only if is inside P. Here a ray is a line start from a point. Before designing an algorithm based on this observation, there are two cases that we need to be cautious. One is when the ray across some vertices of P. Under such case, we need another way to  define the number of times that the ray across the boundary of P. The second case is when x is actually on the boundary of P, but this is easy to tell. Therefore, the skeleton of the algorithm looks as follows:

  1. Decide if x is on the boundary of P, if so return True, otherwise return False. This step takes O(N) since there are N edges.
  2. Find a ray that starts from x (we can just use a vertical ray and the ray can ends at the lowest y coordinate of the polygon + 1) , for each edge of P, check if the ray intersect with that edge. Each such operation takes O(1) and the total takes O(N).  Here we need to handle the special case where the ray across some vertices of P.  Assume i1, i2, ... , ik are the points at which the ray intersect the boundary, if any such point is not a vertex of P, then we define the number of times that the ray across the boundary of P as k. If any such point is a vertex of P, we define the number of times that the ray across the boundary of as 1. For the rest, we  define the number of times that the ray across the boundary of as num_of_non_vetex_points.

Find the Greatest Common Divisor (GCD)

Problem: Given two non-negative integer a and b, find the greatest common divisor.

Solution: A very efficient algorithm is Euclid’s algorithm.  The main idea here is GCD(a, b) =  GCD(ba mod b). Therefore, the code is straightforward:

int GCD(int a, int b)
{
   if(b==0) return a;
  
   return GCD(b, a mod b);
}