Sunday, January 13, 2008

Random Math Puzzle

This was something I came across during one of my recent forays into useless mathematics.

Suppose you have a collection of n items and you want to choose k of them at random (without replacement). Assume that n is greater than k. Associated with each item x is a value yx which is real valued between 0 and 1. We have an additional constraint that y1 + ... + yn = k.

Devise a randomized algorithm to select k items such that, for each item x, the probability of x being selected is exactly yx.

[EDIT 02/17/2008] I figured that I should write down the solution somewhere before I forgot about it.

Consider a circle of circumference k. Each of the n items owns a contiguous segment of the circle, with the length of xi's segment equal to yi. Clearly we have that the sum of all the segments is k, thus the entire circle is owned by some item.

Place k markers around the circle which are of uniform distance apart. So each marker is 1 unit distance away from its two neighbors. Now imagine spinning the circle like a wheel while the markers stay fixed. We can alternatively imagine rotating the markers while the circle stays fixed. After spinning, select the k items whose segments are touching a marker.

Proof of correctness. The proof takes two parts. We first prove that the number of items selected is k. We then prove that the probability of each item xi being selected is yi. Since the markers are 1 unit apart and the maximum length of a segment is 1, then each segment will almost surely be touching at most 1 marker. Therefore, the number of items that will be selected is k.

For item xi, consider the location of the center of its segment ci. For xi to be selected, ci needs to be within yi/2 distance from a marker (from either side of the marker). Therefore, we can consider the equivalent problem where each marker is the center of a segment (of the circle) of length yi, and xi is selected if, after spinning, its center ci falls inside one of the markers' segments. Viewing the problem this way allows for an easier probabilistic analysis. The total length of segments of the markers is k*yi. The total length of the circle is k. So the probability that xi is selected is exactly k*yi/k = yi.


Benjamin Darfler said...

Dude, that sounds like something I would get asked at a Google interview. I lost track of the question by the end.

Yisong said...

I think the degree of difficulty of this problem is appropriate for a technical interview. I think interview questions typically don't deal with probability that much... something that I hope will change in the future.