Here's a version I just wrote:
import random
"""
iterator is an Iterator/Generator object for a stream
and N >= k
"""
def reservoir(iterator, return_count):
arr = []
current = 0
for item in iterator:
if current < return_count:
arr.append(item)
else:
rand = random.randint(0, current)
if rand <= return_count:
arr[rand] = item
current += 1 return arr
Once I read it, I noticed that it is strikingly similar to an inside out Fisher-Yates Shuffle (which is how you you return a random permutation of a list from a list)That would look like:
import random
def shuffle(cards):
deck = []
current = 0
for card in cards:
rand = random.randint(0, current)
if rand != current:
deck.append(deck[rand])
deck[rand] = card
return deck
That said, with reservoir, you won't have a random permutation of the elements. And shuffle uses extra memory if the length of the card pile is very long, and you only need some k cards.
Thus:
import random
def resshuffle(iterator, return_count):
arr = []
current = 0
while (current < return_count):
rand = random.randint(0, current)
if current < return_count:
arr.append(arr[rand])
arr[rand] = iterator.next()
current += 1
for item in iterator:
if rand <= return_count:
arr[rand] = item
current += 1
return arr