Suppose you have a collection of n items and you want to choose k of them at random (without replacement). Assume that n is greater than k. Associated with each item x is a value y

_{x}which is real valued between 0 and 1. We have an additional constraint that y

_{1}+ ... + y

_{n}= k.

Devise a randomized algorithm to select k items such that, for each item x, the probability of x being selected is exactly y

_{x}.

**[EDIT 02/17/2008]**I figured that I should write down the solution somewhere before I forgot about it.

Consider a circle of circumference k. Each of the n items owns a contiguous segment of the circle, with the length of x

_{i}'s segment equal to y

_{i}. Clearly we have that the sum of all the segments is k, thus the entire circle is owned by some item.

Place k markers around the circle which are of uniform distance apart. So each marker is 1 unit distance away from its two neighbors. Now imagine spinning the circle like a wheel while the markers stay fixed. We can alternatively imagine rotating the markers while the circle stays fixed. After spinning, select the k items whose segments are touching a marker.

Proof of correctness. The proof takes two parts. We first prove that the number of items selected is k. We then prove that the probability of each item x

_{i}being selected is y

_{i}. Since the markers are 1 unit apart and the maximum length of a segment is 1, then each segment will almost surely be touching at most 1 marker. Therefore, the number of items that will be selected is k.

For item x

_{i}, consider the location of the center of its segment c

_{i}. For x

_{i}to be selected, c

_{i}needs to be within y

_{i}/2 distance from a marker (from either side of the marker). Therefore, we can consider the equivalent problem where each marker is the center of a segment (of the circle) of length y

_{i}, and x

_{i}is selected if, after spinning, its center c

_{i}falls inside one of the markers' segments. Viewing the problem this way allows for an easier probabilistic analysis. The total length of segments of the markers is k*y

_{i}. The total length of the circle is k. So the probability that x

_{i}is selected is exactly k*y

_{i}/k = y

_{i}.

## 2 comments:

Dude, that sounds like something I would get asked at a Google interview. I lost track of the question by the end.

I think the degree of difficulty of this problem is appropriate for a technical interview. I think interview questions typically don't deal with probability that much... something that I hope will change in the future.

Post a Comment