Tuesday, January 24, 2012

Find the Largest Container in a Histogram

Problem: Given a histogram, pick two bars in the histogram as the left and right sides of a container. X axis is consider as the bottom. Then we have a container which can hold water. Find the container that can hold the largest volume of water. The container must be placed horizontally (you can't rotate container).

Solution: We can still use DP to solve this problem in O(n). Basically, there are two factors decides the volume of the container: 1) the minimum of the two sides 2) how wide the container is. If the highest bars are just at the left and right end of the histogram, we know that by picking these two bars we can have the largest container. If the bars at the left and right end are not the two highest, we need to explore further to look at other combinations. Based on these observations, what we need is actually an increasing sequence from left to right and  also an increasing sequence from left to right. The details of the algorithm is as follow:

  • First model the histogram as an array h[i]. For example, h[] = {3, 1, 4, 7, 5, 2, 6}. 
  • Then find the two increasing sequences. The one from left to right is left[] = {3, 4, 7} and the one from right to left is right[] = {6, 7}.
  • begin with the first elements in left[] and right[], calculate the volume of the container formed by these two bars, use max_v to store this volume (which is 3*5=15). 
  • Then select the smaller one of the two bars, make one advancement in the array which the smaller bar is from. For our example, between left[0] and right[0], left[0] is the smaller one. Therefore we advance to left[1], which is "4". Then we calculate the volume of the container formed by left[1] and right[0]. The volume is 4*3=12, which is smaller than the previous value 15, so we just keep the old value. 
  • Then we repeat the previous step, just advance the array which the smaller bar is from. We will get   left[2] and right[0]. The corresponding volume is 6*2 = 12, which is still smaller than 15.
  • Then we need to advance in right[], we will find right[1] and left[2] refer to the same bar. Since at this time, both left[] and right[] have been exhausted, our algorithm just abort.
  • One more thing, when two bars  left[i]  == right[j], we need to advance both arrays and inspect   left[i+1] and right[j+1] next.

Monday, January 23, 2012

Make Best Use of the Conference Room

Problem: Given a conference room and a number of presentations with start and end time ( e.g., [4, 9], [5, 10]), try to make an arrangement which allows the conference room to be used for maximum time. Overlapping presentations can't be in the same arrangement.

Solution: We can have a O(nlogn) solution by using DP.  The details are as follows:

  • Sort the presentations by their end time. Thus we will have a sorted array end[N].   N is the number of the presentations.
  • Have an array Max_arr[N].  Max_arr[i] stands for if we take end[i] as the close time for the conference room, the maximum time the room can be used. It is easy to know that  Max_arr[0] = the duration of the presentation that ends at  end[0]. We need to find out Max_arr[N-1].
  • To calculate Max_arr[i], we first get the start time si of the presentation that ends at end[i]. Then we do binary search in end[0] ... end[i-1] for si. Basically we need to find a j such that  end[j] < si &&  end[j+1] > si. Therefore,  Max_arr[i] = max(Max_arr[i-1],  Max_arr[j] + end[i] -  si ).

Zeckendorf's Theorem -- How to Represent a Positive Integer with Fibonacci Numbers?

Zeckendorf's Theorem is about that every positive integer can be represented in the form of the sum of one or more Fibonacci numbers.

For example, 6 = 3 + 2 + 1, 10 = 8 + 2. We can further represent these positive integers in the binary form of Fibonacci numbers. Thus, 6 can be represented as "111" which means fibo(1) +  fibo(2) + fibo(3) -- the sum of the first, second and third Fibonacci numbers. Similarly, 10 can be represented as "10010" which means fibo(2) +  fibo(5)  --  the sum of the fsecond and fifth Fibonacci numbers.

How to find the Fibonacci numbers that sum to a particular positive integer? Basically you can use DP and leverage the binary form.

  • 1 can be represented as "1", 2 as "10", 3 as "11", 4 as "101", 5 as "110". Thus, we can observe the binary form of a positive integer i is just the immediate next number of the binary form of a i-1. However, this "immediate next" is slightly different when an integer has a binary form of all "1". For example, 3 is "11". The  "immediate next" of "11" is not "100" but "101". This is due to the nature of Fibonacci numbers. We know "100" = "11" since fibo(n) = fibo(n-1) + fibo(n-2). So here we need to add one more "1" to "100".
  • Based on the above analysis, we can get those Fibonacci numbers for a particular positive integer by using DP.
Actually not only positive integers, negative integers can also be represented in the sum of "broadly defined" Fibonacci numbers.

    Thursday, January 19, 2012

    Total Area Covered by a Set of Overlapping Segments

    Problem: Given a set of overlapping segments, output the total length they covered. For example, for segments [1,3], [2,5], [6,7], the total length is 5.

    Solution: Sweeping line is the right solution for this type of problem. Also the algorithm to this problem is the building block to solve the problem of total area covered by as set of overlapping rectangles. The following gives the detail of the algorithm:

    • First, sort the points (start and end) of all the segments. 
    • Have a event queue and a counter cnt.   cnt is used to store the number of  active segments. When encountering a point p, first look at cnt, if  cnt > 0, there must be a previous point p' (  p' is the point in the queue that is immediately before  p) in the queue, add  p'-p to the total length covered.  Then add the current point p into the queue. If it is a start point, increase cnt by 1.   If it is an end point, decrease cnt by 1.  
    • After iterating all the end points, we get the total length covered. We can also choose not to use cnt. Then we need to watch the size of the queue. Besides, when encountering an end point, we need also to remove some corresponding points in the queue, which is more complicated.

    Longest Subarray with Equal "1" and "0"

    Problem: Given an array that only contains "1" and "0", find the longest subarray which contains equal number of "1" and "0".

    Solution: With hash table, we can have a O(N) solution. The detail is as follow:

    • First convert all "0" to "-1", then calculate c[i] = sum(a[0], ... , a[i]). It takes O(N) to calculate all the c[i].
    • Then our task is to find a c[i] and a c[j] such that  c[i] = c[j]  and |j-i| is maximum. With a hash table, we can finish this job by doing a linear scan with a time complexity of   O(N).
    • There is a special case you need to handle. When c[N-1] = 0 (assume N is the size of a), the longest subarray is just a itself.

    Implement Text Editor with Gap Buffer

    Gap Buffer is a data structure which can be used to implement text editor. The advantage is that insertion/deletion can be very efficient and the data structure is simple. The disadvantage is when cursor changes its location frequently or the gap is frequently full, there will be a lot of copying operation, which is costly. The following gives more details about gap buffer.

    • Basically gap buffer can be implemented as an array (or a dynamic array) with some pointers to differentiate three regions: the segment before the gap, the gap, the segment after the gap.
    | the front segment  |   the gap  |  the back segment  |
    • The location of the cursor in the text editor decides the border between the front segment and the gap. A example is given here:
    We had a [            ] big day
    • As the above example, "We had a" is the front segment, "[   ]" is the gap buffer and "big day" is in the back segment. 
    • When insertion, new text is filled in the gap buffer, the start pointer of the gap buffer also moves accordingly. If the gap is full, we need to create new gap buffer and we may also need to move all the content in the back segment backwards.
    • When deletion,  we only need to move the start pointer of the gap buffer forwards if we are deleting the text in the front segment or  the end pointer of the gap buffer backwards if we are deleting the text in the back segment.
    • If we move the cursor, for example, the cursor is between "We" and "had" now:  "We [     ] had a big day". We need to copy the "had a" which was originally in the front segment to the back segment.

    Wednesday, January 18, 2012

    Rabin–Karp Algorithm

    Rabin–Karp algorithm is yet another algorithm to do pattern search in a string (also see KMP algorithm). The problem is given a target string s and some pattern p (p is also a string), how to efficiently decide if s contains p as substring.

    The key of Rabin-Karp algorithm is to use hash, more specifically, rolling hash. Then with hash, when checking the match of a substring of  s  to the pattern p, we don't need to compare each character of s and  p. Instead, we just compare hash(substring of s) and hash(p). If the hash values are not the same, the substring doesn't match with the pattern. If same, we still need to further inspect every character of  the substring and the pattern, since hash function can have collisions. However, this further check is not expected to be often.

    Then the problem is how to efficiently calculate the hash of all the substrings of  s with a specific length. The rolling hash just kicks in here. One simple example of rolling hash is to use the sum of all the ASCII value of the characters in the substring as the hash value, e.g., s[i] + s[i+1] + ... + s[i+m-1] for the substring s[i... i+m-1]. Then for the substring s[i+1... i+m], we can reuse the hash value we just calculated for the substring s[i... i+m-1]. Assume it is h, then the hash value for substring s[i+1... i+m] is just h - s[i] + s[i+m]. In practice, more sophisticated rolling hash function is used. For example, using some large prime, the hash of "hi" can be ascii("h")*101^1 + ascii("i")*101^0

    If there is just one pattern, Rabin–Karp algorithm is not as good as KMP algorithm. Rabin–Karp algorithm is more useful in multi-pattern match. It is used together with a bloom filter. A bloom filter can be created to represent multi-pattern. Then if a substring has hash value hit in the bloom filter,  it is probably there is a match.




    Dynamic Programing Solution to Subset Sum Problem

    Problem: Given a set of integers, find a subset of them which sum to a target value s.

    Solution: We can definitely first sort the set, then use some recursive approach to solve the problem. The complexity is expected to be exponential. Here we give a DP approach, though the complexity is also not polynomial.

    • First we need to define the state. We have a state function Q(i,k) which means if we can find a subset from the set of integers a1,a2, ..., to ai which sum to k. So Q(i,k) is a Boolean function.
    • We have  Q(1,k) = ( a1 == k);  Q(i,k) =  Q(i-1,k)  or  ( ai == k) or  Q(i-1,k-ai).
    • Then we just need to fill in the matrix of  Q(i,k). We know 0< i <= the size of the set. For k, we have   N=< k <=P, where N is the sum of all the negative integers in the set and P is the sum of all the positive integers in the set.

    Tuesday, January 10, 2012

    Find K Elements with Maximum Minimum Consecutive Difference

    Problem: Given a sorted array A, find k elements from the array with the maximized minimum consecutive difference. For example, m1, ... , mk are selected elements. We want to maximize min(mi+1-mi). A more concrete example, a sorted array A = [1, 3, 4, 8] and k =3. There are four ways of picking 3 elements: {1,3,4}, {1, 4, 8}, {1, 3, 8}, {3, 4, 8}. The minimum consecutive difference for these four selections are 2, 3, 2, 1, respectively. The maximum of these minimum consecutive differences is 3, which is from selection {1, 4, 8}.


    Solution: [*will re-edit later*] This is problem is similar to given a line, try to divide the line into several segments as even as possible. It is also similar to the "Painter and Slates" problem (see a link here). The key here is to use binary search, which will give you O(nlog(A[n-1] - A[0])). Using DP definitely is OK, but the complexity is O(k*n^2).
    • Basically, the selection depends on the how "wide" the gap we use. If we use the gap as wide as A[n-1] - A[0] (n is the size of array), we can only select 2 elements. On the other way, if we choose the gap as min(A[i+1]-A[i]), we will select n elements. Our goal is to select k elements while with maximized minimum difference. k is within 2 and  n. Then a binary search can help us find k.
    • We just do binary search on the space of the width of the gap. We set low =  min(A[i+1]-A[i]), high =  A[n-1] - A[0] and make mid = (low + high) /2. 
    • With this mid as the width of the gap, we can calculate the number of elements we selected. If it is larger than k, we need to increase the width of the gap: we just update low = mid; Otherwise, we need to decrease the width of the gap by update high = mid.
    • We stop when we find a gap such that if we further increase the gap, we can't be able to select k elements.
    • Try to calculate m[i][j][k], where i and j are the index within A[] and k how many elements are selected with in the region A[i] ~ A[j]. The time to calculate this array is O(N^4).