Thursday, January 19, 2012

Longest Subarray with Equal "1" and "0"

Problem: Given an array that only contains "1" and "0", find the longest subarray which contains equal number of "1" and "0".

Solution: With hash table, we can have a O(N) solution. The detail is as follow:

  • First convert all "0" to "-1", then calculate c[i] = sum(a[0], ... , a[i]). It takes O(N) to calculate all the c[i].
  • Then our task is to find a c[i] and a c[j] such that  c[i] = c[j]  and |j-i| is maximum. With a hash table, we can finish this job by doing a linear scan with a time complexity of   O(N).
  • There is a special case you need to handle. When c[N-1] = 0 (assume N is the size of a), the longest subarray is just a itself.

6 comments:

  1. suppose given input is :
    1 1 1 0 0 0 1 0

    converting 0 to -1;

    1 1 1 -1 -1 -1 1 -1

    calculating c[],we get

    c[]= 1 2 3 2 1 0 1 0

    now could you please explain after this...
    bcozz there are many c[i]=c[j] nut no c[j-1] is maximum...please explain it bit more using this example..thanks

    ReplyDelete
    Replies
    1. The maximum we want to get is the value of |j-i|. Given your example, we need a hashtable here. We found out c[0] = c[6] = 1 and |j-i| = |6-0| =6 which is the maximum of |j-i|. Therefore, the longest sequence is a[1]~a[6]. However, this is not the correct answer to your example. The original post has some problem (thanks for your example). Here there is one more special case we need to think of. If the size of a is N and c[N-1] = 0, the longest sequences is just a (a[0]~a[N-1]. So the correct answer for your example is a[0]~a[N-1].

      Delete
    2. hey...can u explain dis algo for exmple {1,0,0,1,0,1,1}plz.... not able to get it

      Delete
    3. First convert {1,0,0,1,0,1,1} to {1, -1, -1, 1, -1, 1, 1}, then calculate c[] as {1, 0, -1, 0, -1, 0, 1}. Next create a hash table "hash". The key of the table is value in c[], the value is a pair of positions {start_pos, end_pos}. Init the table like this: hash[0] = {-1, N/A}. Then process c[0], we have hash[1] = {0, N/A}. Then process c[1], we have hash[0] = {-1, 1}, hash[1] = {0, N/A}. Then process c[2], we have hash[-1] = {2, N/A}, hash[0] = {-1, 1}, hash[1] = {0, N/A}. Then process c[3], we have hash[-1] = {2, N/A}, hash[0] = {-1, 3}, hash[1] = {0, N/A}. Then process c[4], we have hash[-1] = {2, 4}, hash[0] = {-1, 3}, hash[1] = {0, N/A}. Then process c[5], we have hash[-1] = {2, 4}, hash[0] = {-1, 5}, hash[1] = {0, N/A}. Then process c[6], we have hash[-1] = {2, 4}, hash[0] = {-1, 5}, hash[1] = {0, 6}. Then the largest distance is {-1, 5} and {0, 6}. These two refer to {1,0,0,1,0,1} and {0,0,1,0,1,0}

      Delete
  2. The algorithm you described does not have the worst-case complexity of O(n).

    Instead of hashing, one can sort the C array. And the scan the sorted list to find if there exists two cells with the same value or not (O(n\log n)).

    To obtain an O(n) algorithm, we can use this observation that the values in C array is in range (-n..+n). So, we can use a memory of size 2n+ 1 to check if there is a repeated value in C or not.

    ReplyDelete
    Replies
    1. The hash method does give you an O(n) solution. Just you need make some augment. The key-value pair in the hash table is {key, (first_pos, last_pos)}. For items in c[], you just do a linear scan. Then you can remember the first occurrence and last occurrence for each distinct value. One thing would affect O(n) is the quality of hash function, but here we assume hash gives us O(1).

      Your O(n) algorithm definitely works and I agree it is a good solution to this problem.

      Delete