Tuesday, June 28, 2011

Reservoir Sampling

Reservoir sampling is a sampling algorithm that randomly pick k elements out of a set of n elements. It is adapted from Knuth Shuffling. The key idea is to maintain a buffer with size k, then replace the elements in buffer with a descending probability. The application of this algorithm is when sampling a finite subset from streaming data. Reservoir sampling is very useful for online sampling on streaming data.

int buf[k];
//initialize the buffer
for(int i=0; i<k; i++)
   buf[i] = a[i];

for(int i=k; i<n; i++)
{
   int idx = random(0, i);
   if(idx<k) swap(buf[idx], a[i]);
   
}

No comments:

Post a Comment