The figure shows the expected motion of a system where two players using the Bush–Mosteller reinforcement learning algorithm play a symmetric
(for temptation) is the payoff a defector gets when the other player cooperates;
(for reward) is the payoff obtained by both players when they both cooperate; both players obtain a payoff of
(for punishment) when they both defect; and finally,
(for sucker) is the payoff a cooperator gets when the other player defects. Parameter
denotes both players' aspiration threshold, and
is their learning rate. Noise is the probability that a player undertakes the opposite action she or he intended. The arrows represent the expected motion at various states of the system. The background is colored using the norm of the expected motion.