Econometrica: Mar, 1997, Volume 65, Issue 2
Prediction, Optimization, and Learning in Repeated Games
https://doi.org/0012-9682(199703)65:2<275:POALIR>2.0.CO;2-I
p. 275-309
John H. Nachbar
Consider a two-player discounted repeated game in which each player optimizes with respect to a prior belief about his opponent's repeated game strategy. One would like to argue that if beliefs are cautious, then each player's best response will be in the support, loosely speaking, of his opponent's belief and that, therefore, players will learn as the game unfolds to predict the continuation path of play. If this conjecture were true, a convergence result due to Kalai and Lehrer would imply that the continuation path of the repeated game would asymptotically resemble that of a Nash equilibrium. One would thus have constructed a theory in which Nash equilibrium behavior is a necessary long-run consequence of optimization by cautious players. This paper points out an obstacle to such a theory. Loosely put, in many repeated games, if players optimize with respect to beliefs that satisfy a diversity condition termed neutrality, then each player will choose a strategy that his opponent was certain would not be played.