Saturday, April 24, 2010

The Multi Armed Bandits Problem

Three armed bandits are in a gunfight with each other. The gunfight proceeds in a sequence of rounds. In each round, each bandit chooses one of the bandits to target, and all three shoot simultaneously. Each bandit (while alive) shoots once per round. Each bandit has a certain accuracy, and the probability of hitting or missing a target is independent of the target (i.e., it doesn't matter who the bandit is shooting), and also independent of all other shots.

Each bandit knows the others' (possibly randomized) strategy. Any randomness used by any of the bandits is hidden from the other bandits (i.e., the distribution of actions that defines any bandit's strategy is independent of the other bandits' strategies).

The gunfight ends when at most one bandit remains alive. A bandit "wins" the gunfight if he survives until the last round (he is either the last bandit remaining, or he survived until the very end when all remaining bandits are simultaneously killed). In the degenerate case where all three bandits have 0 accuracy (and thus the gunfight never ends), then all three bandits are declared winners.

What is the Nash equilibrium?

(Thanks to Vasu for helping come up with the name.)

5 comments:

Vasu said...

1) I don't understand what you're trying to say in the second paragraph. The first sentence seems to be your way of saying this is an ex post Nash?

2) Is your set of actions just shooting one of the other bandits? Is there a no-op action? As pointed out by the excellent Stephen Fry on QI , I think it matters.

3) Thanks for the (dis)credit.

Yisong Yue said...

1) If I'm understanding the definition of ex post Nash correctly, then the answer is yes.

2) You have to shoot.

3) You're very welcome =)

David Flint said...

I don't actually know that much about game theory, but it's actually a pretty interesting question and of course a particular Nash equilibrium will depend on the accuracies assigned to each bandit. For a full solution, accuracies would have to be broken into sets and considered independently.

Considering the case where all the accuracies of the bandits are equal though, it appears to me that every set of strategies where each bandit targets another bandit every round is a Nash equilibrium. This is because there is no benefit for one bandit, call them A, to preferentially target another bandit, call them B. Counterintuitively, at least to me, even if bandit B is preferentially targeting bandit A, bandit A still has only a 50% chance of surviving the last bandit if they eliminate bandit B and thus does not increase their chances of survival. To put it another way, the probability a bandit wins is determined not by their strategy, but the strategy of the other two bandits; a bandit really only has the capability to choose a loser. All of these Nash equilibria are unstable because a change in strategy for one bandit does not result in a better for strategy for any other bandit.

I would also note that the way you've phrased the problem, it seems to me that a bandit can choose themself as their target. Obviously, any Nash equilibrium would not have this as part of the strategy since a bandit adopting this strategy could always improve their chances of winning by aiming at one of the other two bandits.

Relatedly, I suspect that a no-op as Vasu suggested could only be contained within a Nash equilibrium for this problem if bandits are allowed to adopt strategies with hysteresis. This is suggested to me by a significant difference between this problem and the problem proposed by Martin Gardner (http://en.wikipedia.org/wiki/Three_way_duel_%28puzzle%29) and referenced by QI. In that problm each duelist shoots sequentially in particular order while in this one all bandits shoot simultaneously, so a bandit has no ability to change their target based on the current target of any other bandit in the current round. Thus, a bandit can not decrease their chances of survival and winning by targeting another bandit. However, if bandits are allowed strategies that rely on the targets of bandits in previous rounds, this can change. In particular, a bandit could choose to target a bandit simply because they targeted them in a previous round. Also, if a no-op were allowed one valid Nash equilibrium, though unstable, would be for all bandits to not shoot or shoot at nothing, regardless the accuracies of the bandits.

Finally, I am quite glad to meet another fan of Stephen Fry and QI.

Yisong Yue said...

David, I'm not sure that what you said about the case where all the accuracies of bandits are equal is true.

Consider the case where all three bandits have 100% accuracy. If A is shooting at B, and B is shooting at C, then the only way C can win is if C shoots at A (in which case, all three bandits die).

David Flint said...

Hrm, excellent catch Yisong! I keep falling into the trap of thinking if a bandit dies they've lost even though as you rightly point out, if all the bandits die simultaneously, they can win.

In the 100% equal accuracy case, your solution concept is a stable Nash equilibrium, one of two.

I've been continuing to think about it, and this suggests to me that there should be a similar stable Nash equilibria for any set of equal accuracies. In particular, each bandit always targeting another bandit in a circular fashion should be a stable equilibrium regardless of the accuracies as long as they are equal.

If that is correct, can we similarly determine if stable Nash equilibria exist for unequal accuracies A > B > C by considering the situation where the accuracy of A is 100%, the accuracy of B is 50%, and the accuracy of C is 0% assuming that these Nash equilibrium hold for all such sets of strictly ordered accuracies? Again, I'm not really sure.