Evolutionary Stable Strategies (ESS)

A very short intro to evolutionary game theory

Game theory developed to study the strategic interaction among rational self regarding players (players seeking to maximize their own payoffs). However, by the early 1970’s the theory underwent a transformation and part of it morphed into evolutionary game theory, which allows to increase our understanding of dynamical systems, especially in biology and, more recently, in psychology and the social sciences, with significant ramifications for philosophy. The players are not required to be rational at all, but only to have (perhaps hardwired) strategies that are passed on to their progeny. In short, the notion of player is displaced by that of strategy, and consequently the notion of a player’s knowledge, complete or incomplete, is dispensed with. What drives systems is not the rationality of the players but the differential success of the strategies.

As before, we consider only two-player games. A game with strategies s1,…..sn for both players is strategy symmetric (symmetric, in brief) if:

when i=j the payoffs for the two identical strategies are the same, which means that along the main diagonal (top left to bottom right) the payoffs in each box are the same
the payoffs in the remaining boxes on one side of the main diagonal are mirror images of their counterparts on the other side.

For example, The Prisoners’ Dilemma is a symmetric game. Along the main diagonal (top left to bottom right), the payoffs are the same in each box that is, (1,1) and (6,-6); moreover, we have (-10, 10) in the top right box and (10, -10) in the bottom left, which are mirror images of each other.

	S	C
S	1;1	10;-10
C	-10;10	-6;-6

A symmetric matrix can be simplified by writing only the payoffs of the row strategies, as those of the column strategies can be easily obtained by exploiting the symmetry of the game. So, the previous matrix can be simplified as

	S	C
S	1	10
C	-10	-6

For simplicity, from now on we only deal with symmetric games.

Evolutionary Stable Strategies (ESS)

An important concept of evolutionary game theory is that of evolutionarily stable strategy (ESS). Although the notion of ESS is not central to evolutionary games, it is important because it points us towards asking what the conditions for invasion are.

To understand it, we need some new notions.

Imagine now that we keep repeating a symmetric game (each round is called a ‘stage game’) with random pairing in an infinite population in which the only relevant consideration is that successful players get to multiply more rapidly than unsuccessful ones. (The demand that the population is theoretically infinite excludes random drift). Suppose that all the players (the incumbents) play strategy X, which can be a pure or a mixed strategy. If X is stable in the sense that a mutant playing a different available strategy Y (pure or mixed) cannot successfully invade, then X is an ESS.

Some terminology will help. If by E(A,B) we understand the payoff resulting from playing strategy A against strategy B (A and B could be identical), and C is any available strategy (C could be A or B) then

· A is a strict best reply to B if and only if E(A,B)>E(C,B), namely, when every other available strategy does worse than A when playing B

· A is just a best reply to B if and only if E(A,B)≥E(C,B), namely, when no other available strategy does better than A when playing B.

X is an ESS if either one of the two following conditions obtain

E(X,X)>E(Y,X), that is, X is a strict best reply to itself
E(X,X)=E(Y,X) and E(X,Y)>E(Y,Y), that is, X is just a best reply to itself but a strict best reply to Y.

Obviously, if (1) obtains, the Y invader typically loses against X, and therefore it cannot persist because all X’s but one will do better than Y, which means that Y’s payoff will be much worse than the average. If (1) does not obtain and (2) does, the Y invader does as well against X as X itself, but it loses to X against other Y invaders, and therefore it cannot multiply, that is, only one Y could persist. In short, Y cannot successfully invade a population of X’s when X is a strict best reply to itself or X is just a best reply to itself but a strict best reply to Y. For example, consider Stag Hunt and a population of Stags. Then Hare cannot invade because E(S,S) > E(H,S), which is condition (1) above.

(Question: Is Hare an ESS?)

It is possible to introduce a strategy that is stronger than an ESS, namely, an unbeatable strategy. Strategy X is unbeatable if, given any other available strategy Y

E(X,X)>E(Y,X) and E(X,Y)>E(Y,Y), namely, X is strict best reply both to itself and to the other strategy.

An unbeatable strategy is the most powerful strategy there is because it strictly dominates any other strategy; however, it is also rare, and therefore of very limited use. For example, in Stag Hunt S, although an ESS, is not unbeatable because E(S,H) < E(H,H).

Given a strategy X, the following relations obtain, with the arrow indicating entailment:

X is unbeatable → X is a strict best reply to itself→ X is a just best reply to itself but a strict best reply to any other available strategy→ X is just a best reply to itself and to any other available strategy

For example, in The Prisoners Dilemma squealing is unbeatable, and therefore also a strict best reply to itself, etc.; by contrast, in Stag Hunt, S is a strict best reply to itself, but it is not unbeatable.

A few final points about ESS should be noted:

Every ESS is a strategy in a Nash equilibrium in the stage game, although the reverse is not true (not all Nash equilibriums are made up of ESS).
A strict Nash equilibrium in the stage game in which both players follow the same strategy is made up of ESS. So, for example, in The Prisoners’ Dilemma the strict Nash equilibrium (which is also a dominance equilibrium) is constituted by ESS. Squealing does better against squealing than keeping silent: in a population of squealers keeping silent is a losing strategy, and therefore it cannot invade.
If E(X,X)=E(Y,X) and E(X,Y)>E(Y,Y) obtains, one Y invader would do as well as any X, which means that an ESS need not guarantee the highest payoff.

A very famous version of ESS occurs in Hawk-Dove, a biology oriented version of Chicken.

We now turn to a more general approach to evolutionary games.

Evolutionary Dynamics

We just saw that in a population in which an ESS has already taken over, invasion does not occur successfully. However, under which conditions does a strategy take over in a population? What happens if a game in an infinite population is repeated indefinitely? The answer comes from evolutionary dynamics, which studies the behavior of systems evolving under some specific evolutionary rule. The basic idea here is that of the replicator, an entity capable of reproducing, that is, of making (relevantly) accurate copies of itself. Examples of replicators are living organisms, genes, strategies in a game, ideas (silly or not), as well as political, moral, religious, or economic customs (silly or not). A replicator system is a set of replicators in a given environment together with a given pattern of interactions among them. An evolutionary dynamics of a replicator system is the process of change of the frequency of the replicators brought about by the fact that replicators which are more successful reproduce more quickly than those which are less successful. Crucially, this process must take into account the fact that the success, and therefore the reproduction rate, of a replicator is due in part to its distribution (proportion, frequency) in the population. For example, when playing Chicken, although drivers do well against swervers, in a population of drivers a single swerver does better than anybody else. So, it will reproduce more quickly the others. However, at some point or other, there will be enough swervers that the drivers will start to do better again. It would be nice to know whether there is some equilibrium point, and if so what it is.

Since the differential rate of reproduction determines the dynamics of the system, we need to be more precise and specify what we mean by ‘more quickly’. This is determined by the dynamics of the system; the one we are going to study is replicator dynamics. There are other models that plausibly apply to evolution, but replicator dynamics is the easiest and the most often used, at least at a first approach. Replicator dynamics makes two crucial assumptions:

There is no random drift; in other words, there are no random events interfering with the results of differential fitness, the fact that some individuals are favored in relation to others. In small populations, random drift is a significant dynamical force, an issue we shall have to deal with later.
The interactions among players (strategies) are random; that is, the probability of strategy S meeting strategy H is the frequency of H. For example, if 1/3 of the population is H, the probability of an individual playing an H is 1/3 and that of playing an S is 2/3.

In addition, we shall restrict ourselves to studying repeated games whose stage games are symmetric and have only two players, so that the math remains easy.

To understand replicator dynamics, we need to introduce a few notions.

The first is that of rate of change. Experience teaches that things often change at different rates. For example, sometimes accidentally introduced species multiply more quickly than native species: in other words, their rate of growth is greater than that of native species, and this typically results in a decline of the natives. So, if at the moment of introduction the frequency of the non-native species was p and that of the native species was q= 1-p, with q >> p, after some time the situation may change and become p>q. This means that p has a positive rate of change (it increases) while q has a negative rate of change (it decreases). Mathematically, we express this by writing

D(p) > 0 and D(q) <0 ,

where D(p) (the derivative of p with respect to time) means ‘the rate of change of p’, and similarly for q.

So, suppose a population P increases by 30% every second; then if in the beginning p=100, then after one second, p₁= 130, after 2 seconds, p₂ is given by p₁ plus 30% of p₁, namely 169, and so on, and D(p) > 0.

The second notion is that of expected payoff of a pure strategy. Suppose a pure strategy s is played against strategies S1, …,Sn, and that Pr(Si) is the probability that Si is played. If by E(s) we understand the expected payoff of strategy s, then,

E(s)=E(s,S1) Pr(S1) + …..+ E(s,Sn) Pr(Sn).

That is, if by Si we denote a generic S, the expected payoff of s is the sum of the payoffs of s against each of the Si times the probability that Si is played. For example, consider the matrix below,

	S	H
s	4	1
h	2	3

and suppose that in the columns S is played 1/3 of the times and H 2/3 of the times. Then E(s) is

4(1/3) + 1(2/3) = 2.

E(h) is

2(1/3) + 3(2/3) = 8/3.

The third notion is that of average payoff of a set of strategies, and to understand this, we need to consider the notion of average. Suppose that in a group of boxes, 1/3 weigh 30 kilos each, ½ weigh 20 kilos each and 1/6 weigh 60 kilos each. Then the average weight ĀW is:

ĀW =30x1/3 + 20x1/2 + 60x1/6 = 10+20+5=30.

In words, the average weight is the sum of all the 3 weights, each multiplied by its frequency. (Since 1/3 of the boxes weigh 30 kilos, we multiply 30 by 1/3, and so on). Similarly, if S and H are the two available strategies, the average payoff is

ĀE= E(S)Pr(S) + E(H)Pr(H)

Namely, the expected payoff of S times the probability (the frequency with which) S is played, plus the expected payoff of H times the probability (the frequency with which) H is played.

For example, consider the following Stag Hunt matrix in replicator dynamics, and suppose that Pr(S)=p, so that Pr(H)=1-p

	S	H
S	3	0
H	2	1

Table 12

Then, first we calculate E(S) and E(H):

E(S)=3p+0(1-p)=3p,

E(H)=2p+1(1-p)=p+1.

Then, we calculate the average expected payoff ĀE, which is E(S) times the probability that S is played plus E(H) times the probability that H is played. As the probability that S is played is p and that that H is played is 1-p,

ĀE=E(S) x Pr(S) + E(H) x Pr(H)= 3p² + (p+1)(1-p)= 2p² +1.

In replicator dynamics, if Pr(S)=p, the dynamical equation (the equation ruling the behavior of the system through time) is:

D(p)= [E(S) – ĀE]p.

In other words,

the rate of change of the frequency of a strategy (in this case S), is proportional to the difference between the expected payoff of S and the average payoff.

Consequently, when S’s expected payoff is greater than the average payoff, the frequency of S increases, and when it’s smaller then the frequency of S decreases. Hence, in our example we have:

D(p)= [E(S) – ĀE]p = [-2p² +3p-1]p.

Obviously, when D(p)=0 the frequency of S (that is, p) does not change. The values of p for which the frequency of S does not change are called “fixed points”. So, let us find the fixed points in our example, that is, let us find when

[-2p² +3p-1]p=0.

Obviously, one fixed point is p=0. For the other two, we need to solve

-2p² +3p-1=0,

which gives p=1 and p=1/2. So, when p=0, or p=1, or p=1/2, the frequency of S does not change. But what happens when p is not equal to any of the three fixed points? Let us study the plot of

D(p)=[-2p² +3p-1]p.

As we can verify by substituting 1/3 for p, when 0<p<1/2 the rate of growth of p is negative, and when 1/2<p<1 the rate of growth of p is positive (just substitute 2/3 for p, for example). Since for p=1/2 the growth rate of p is zero, the plot looks, more or less, like this:

If you want to get a precise graph, study the function or go to

http://www.webgraphing.com/graphing_basic.jsp

If at some time p<1/2, as the growth rate is negative p will eventually become zero, that is, the strategy S will disappear; by contrast, if at some time p>1/2, then S will become fixated, that is it will remain the only strategy (H will disappear). If p=1/2, then exactly half of the population will play S and half H. However, this equilibrium is not stable in the sense that even a minor deviation from it will push one of H or S to extinction and the other to fixation, both of which are stable.

The interval (0,1/2) is the basin of attraction of which H is the attractor, and the interval (1/2,1) that of S. The fixed points p=0 and p=1 are asymptotically stable because each is an attractor of a basin of attraction. If the basin of attraction of an attractor contains the whole interval in which p is defined, or at least all the interior points, then the attractor is globally stable. In our example, no fixed point is globally stable. 1/2 is the interior fixed point and, as we saw, it is unstable.

Replicator dynamics of a generalized 2 by 2 symmetric game

A symmetric game with 2 strategies, A and B, can be represented by the following payoff matrix, where the payoff are those of the row strategies:

	A	B
A	a	b
B	c	d

It turns out that the replicator dynamics does not change if in any column we add or subtract the same quantity from all the boxes. Hence, we can reduce the matrix by subtracting c from the first column and d from the second, obtaining

	A	B
A	a-c	b-d
B	0	0

Let us now determine the dynamics, with Pr(A)=p.

E(A)=(a-c)p+(b-d)(1-p).

Because of our matrix manipulation E(B)=0, and consequently the average expected payoff is simply

ĀE = p[(a-c)p+(b-d)(1-p)]+0.

Hence,

D(p) = p[(a-c)p+(b-d)(1-p)-p²(a-c)-p(b-d) (1-p)].

After a bit of algebra, we get

D(p) = p(1-p)[(a-c)p+(b-d)(1-p)],

that is,

D(p)=p(1-p)[E(A)].

Hence, D(p) = 0 when p=0, or p=1, or E(A) = 0.

Note that solving E(A) = 0 gives the interior fixed point, we can employ a trick. Here is a highly simplified procedure for finding the interior point:

· Reduce the matrix by setting strategy B’s payoffs equal to zero and modifying the other payoffs accordingly

· Solve E(A) = 0.

There are five possible cases in a game:

A strongly dominates B. As replicator dynamics wipes out strongly dominated strategies, A will reach fixation. If A weakly dominates B, then only one B will be left. B strongly dominates A. Then, B will reach fixation. If the domination is weak, only one A will remain.
A is the strict best response to A and B to B. Then, a>c and d>b, so that in the reduced matrix a-c > 0 and b-d < 0. The interior point determines an unstable equilibrium. The system is bistable, meaning that one of the two strategies will reach fixation, but which does depends on the initial strategy distribution, that is, the initial value of p. The strategy that has the larger basin of attraction is risk dominant: if the initial distribution of the two strategies is random, on average the risk dominant strategy will reach fixation more often.
A is the strict best response to B and B to A, that is, b>d and c>a, so that in the reduced matrix a-c < 0 and b-d > 0. Then the interior fixed point determines a globally stable equilibrium: the system will converge to it independently of the original distribution. A and B will coexist in a predetermined fixed ratio. The system is ergodic, meaning that the final state is independent of the initial conditions.
A and B are neutral: a=c and b=d. Then selection is neutral, D(p)=0 at all times, and the original strategy distribution p will be preserved because random drift is excluded.

In cases (1)-(2), if you do the calculation you may get nonsensical results like probabilities that are negative or greater than zero, or the collapse of the interior point onto 1 or 0. So, it’s cleaner simply to eliminate the dominated strategy and just say that the system is ergodic, with the dominant strategy becoming fixated or almost fixated (one and only one member of the weakly dominated strategy would remain).

There are some interesting connections between dominance, Nash equilibriums, ESS, and replicator dynamics.

Nash equilibriums determine fixed points. However, fixed points such as p = 1 are not associated with a Nash equilibrium.
If the fixed point p is associated with an ESS, then p is asymptotically stable. The converse need not be true.
The fixed point p is associated with an ESS which uses every strategy with positive probability (that is, greater than zero) if and only if p is globally stable.

An example: Chicken

Consider the game of escalation/retreat Chicken, allegedly played by (stupid) teenagers in the 50’s. (A variation of it is played by people, institutions, and countries, or by many animals when they engage in fighting for mates). Two people drive their cars directly at each other until one or both swerve or they crash into each other. Suppose that payoffs are: swerving while other does not: 0; both swerve: 5; neither swerves: -10; continuing on while the other swerves: +10. Here is the payoff matrix, with “S” for swerve and “C” for continue straight:

	S	C
S	5	0
C	10	-10

Now suppose the following:

· A swerver always swerves and a straighter always goes straight; the traits are hardwired, as it were.

· The payoffs of the game are proportional to Darwinian fitness, the number of viable offspring one leaves.

How will the frequencies of swervers and straighters change? Will they reach an equilibrium point? Imagine that a large population is made up of swervers, and suppose we drop a straigther among them. Initially she will do very well because she meets only swervers, and therefore she will leave a lot of straighter offspring. So, the percentage of straighters in the population will increase steeply. However, eventually there will be enough straighters that they’ll meet each other frequently enoughto do worse than swervers. So, if the initial population is composed of both straighters and swervers, there will never be 100% straighters or 100% swervers. Is there an equilibrium point?

The reduced matrix is

	S	C
S	-5	10
C	0	0

Setting E(S)=0, we obtain -3p + 2 = 0, which solves for p= 2/3. Since S is the best reply to C and C is the best reply to S, the graph representing the system’s evolution is

where p is the percentage of swervers and D(p) is the rate of change of p. The system will end up with 2/3 of the players swerving, no matter what the initial distribution is (aside from p=0 or p=1, both of which are, however, unstable). In other words, the system is ergodic. In this system, the equilibrium p= 2/3 is both accessible (the system will get there) and stable (once there the system will stay there).

Another example: Rock, Paper, Scissors

Uta Stansburiana is a lizard whose morph distribution can be understood by considering the game of Rock-Paper-Scissors, a particularly interesting case of a game with three strategies. However, before we do that we need to look at ternary plots, a common way to represent the phase space of such games.

Consider the equilateral triangle RSP and the segments DA, DB, and DC, which are the distances of D from side RS, SP, and PR, respectively. Vertex R represents the strategy Rock, vertex S Scissors and vertex P Paper. Any point D in the triangle, including the interior area, sides, and vertices, represents a probability distribution of the strategies, with distance DA representing the frequency of Paper, DB that of Rock, and DC that of Scissors. Note that any two of such distances fully determines the strategy distribution. The distance between a vertex and the opposite side is normalized, which means that when point D coincides with a vertex, that vertex’s strategy is the only left. For example, when D coincides with P the frequency of P is 1. By the same token, if D is located on a side, then the frequency of the strategy of the opposing vertex is zero. For example, if D is located on RS, then the frequency of P is zero.

Rock-Paper-Scissors is a game in which Rock (R) beats Scissors (S), Scissors beats Paper (P) and Paper beats Rock, so that the relation ‘X beats Y’ is intransitive.

Assuming that winning and losing represent invasion rates of one species or one morph against another, the generalized matrix for the game can always be reduced to

	R	S	P
R	0	a	-b
S	-c	0	d
P	e	-f	0

Table 1

where the invasion rate is between and including 0 and 1, with 1 representing the case in which the winning strategy completely replaces the defeated one. For example, if a=.6, then R wins (invades) against S 60% of the times. Note that when a=c, b=e, and d=f the game is zero sum; moreover, when a=b=c=d=e=f=1 the classic RPS game obtains.

In the replicator dynamics of Rock, Scissors, Paper the following is true:

There is only one internal fixed point (an equilibrium point)
If p is the frequency of P, r that of R and s that of S at the internal fixed point, then the following holds:

I. p=αa, r=αd, s=αe,

II. with α = (a+d+e)^-1, which is the inverse of the sum of the invasion rates.

(I)-(II) entail that the frequency of a strategy at the interior equilibrium is not determined by its own invasion rate but by that of the strategy it invades. For example, the frequency of P is proportional to the invasion rate of R, which is a. Consequently, if a strategy has the highest invasion rate, then its invader has the highest frequency at equilibrium, and if it has the lowest invasion rate, then its invader has the lowest frequency.

There are three cases based on the value of the matrix determinant Δ=bcf-ade (product of the being invades rates minus product of invasion rates):

If Δ < 0, then the interior fixed point is an unstable center, with trajectories spiraling out asymptotically to the sides.

If Δ = 0, then the interior fixed point is a center of stable orbits circling around it.

If Δ > 0, then the interior fixed point is globally stable, with the orbits exhibiting damped oscillations converging to it.

In computer simulations, when Δ < 0:

The orbits spiral out towards the sides eventually touching one of the sides. (This is due to computer rounding)
The species with the lowest frequency at the interior equilibrium has the lowest frequency along every orbit, and consequently the greatest probability of extinction.

Hence, strategy X with the lowest invasion rate is the one most likely to survive, as its invader will vanish, with the result that X will displace that remaining strategy. In short, the game results in the survival of the weakest, if by that we understand that which is least successful at invading. In practice, if strategies are replaced by species or morphs, and a disease, or human intervention, weakens one of them by diminishing its invasion rate, then that species or morph is the most likely to survive.

As an example of RPS, consider the following matrix, where payoffs are invasion rates:

	R	S	P
R	0	.2	-.4
S	-.5	0	.8
P	.3	-.6	0

Then, α = 10/13, so that r = 8/13, s = 3/13, and p = 2/13. As the determinant Δ = (.4x.5x.6 –.2x.8x.3) > 0, it follows that under replicator dynamics the interior equilibrium is globally stable.

Many species or morphs from unicellular organisms to vertebrates, can be studied using RPS, including E. Coli (a microbe in our guts), Uta Stansburiana (a California lizard and the first species to which RPS was successfully applied), Ischnura Elegans (a damselfly), and Paracercis Sculpta (a marine isopod). More generally, RPS can be used to investigate any system in which ‘power’ relations are intransitive.

For an interesting and entertaining lecture on evolutionary game theory, watch this public lecture by Prof. Nowak at Harvard.

Exercises

1. Determine the evolution of the following game under replicator dynamics.

	S	H
S	5	0
H	2	2

2. Determine the evolution of the following game under replicator dynamics.

	A	B
A	3	1
B	4	0

Determine the evolution of the following version of Chicken under replicator dynamics (D= drive straight; S= swerve).

	D	S
D	-5	3
S	-2	0

Determine the evolution of the following game under replicator dynamics. Does the game look familiar?

	S	C
S	1	-5
C	4	0

Hint. Before you launch into calculations, ask yourself whether the one-shot game above is strong dominance solvable. Would you think that strongly dominated strategies survive replicator dynamics?

Determine the evolution of the following game under replicator dynamics.

	A	B
A	3	1
B	3	1

The quasi-replicator dynamics of the iterated Prisoners Dilemma

In replicator dynamics, two players meet randomly, play a one-shot game and then separate, as each randomly meets a player again. As dominated strategies do not survive replicator dynamics, defecting reaches fixation in The Prisoners Dilemma. What happens if we play The Prisoners Dilemma with the evolution equation of replicator dynamics but with direct reciprocity, namely by having the same two players repeat the game more than once, with random drift, with the presence of mutations and with occasional strategy execution errors? To avoid the temptation of using backward induction to defecting all the times, let us suppose that the players do not know how many times they are playing each other; all they know is that after each round they have a probability p of playing again, so that the length of the average play is 1/(1-p) rounds. We may consider a general matrix for cooperation and defection in which only the row payoffs are given:

	C	D
C	R	S
D	T	P

One gets R(eward) for mutual cooperation, P(unishment) for mutual defection, S(ucker) for cooperating against a defector, and T(emptation) for defecting against a cooperator. The Prisoners’ Dilemma obtains if T>R>P>S. We could think of the game as follows. The cooperator helps at a cost c and the receiver of the help gets a benefit b. Defectors do not help and therefore incur no costs. Then: R=b-c; S= -c; T=b; P=0.

To make things more interesting, in addition to ALLC (always cooperate) and ALLD (always defect), let us consider some reactive strategies that act on the basis of what happened in the previous stage.

TFT (tit-for-tat) acts as follows: it starts by cooperating and then it considers the opponent’s last move; if the opponent cooperated TFT cooperates, and if the opponent defected, it defects.

GTFT (generous tit-for-tat) acts as follows: it’s like TFT with one difference: every so many moves (say, 1/3 of the times) it cooperates even if in the previous stage the opponent defected.

WSLS (win-stay; lose-shift) acts as follows: WSLS looks at its own payoff in the last stage; if it equal to T or R it considers his payoff a success and it repeats the previous strategy; if not, it shifts strategy. In short, if the previous payoff was one of the two highest, it keeps doing the same thing; if it wasn’t, it switches.

Martin Nowak has run programs modeling the following scenario. The matrix has R=3,T=5,P=1, and S=0. There is a large number (100 in the run) of randomly chosen and uniformly distributed strategies. There is direct reciprocity; occasionally, the strategies make mistakes, simulating human behavior; new strategies are put into play, simulating mutations, and neutral drift is allowed. What typically happens (M. Nowak, Evolutionary Dynamics, ch. 5) can be visualized as follows:

In a random mix of strategies, ALLD does very well, almost taking over. At that point, even a small cluster of TFT, already present or introduced by mutation, will start expanding because it will defect against defectors (ALLD) but cooperate with cooperators (other TFT’s mostly). Once TFT becomes abundant, its unforgiving nature makes it succumb to GTFT. The reason is that since mistakes sometimes occur, a TFT playing another TFT might defect instead of cooperating. This will prompt the latter to defect as well in the next stage, thus starting a cycle of cooperation/defection. By contrast, GTFT will at some point try to cooperate again, breaking the cycle and receiving higher payoff. In short, GTFT quickly recovers from mistakes while TFT does not. What happens now depends on how generous GTFT is. If it is sufficiently vindictive, it takes over and becomes stable. However, if it is too generous, once it has taken over it will do no better than an ALLC, which may arise by mutation. If the game is played long enough, ALLC will take over by neutral drift. (In a population of N individuals with equal fitness, the probability that eventually all the population will be the descendent of a given individual A is 1/N. Hence, if the game is played long enough, this eventuality will come about). At this point, an ALLD mutation will result in an ALLD explosion. The cycle will start again. However, if when the frequency of ALLC’s is high WSLS arises as a mutant, the cycle is broken. When two WSLS A and B play each other, if they cooperated in the previous stage, they’ll keep cooperating. If A makes a mistake and defects, here’s what happens:

A: CCCDDCCC…

B: CCCCDCCC…

Cooperation, in other words, will resume after two stages.

If A is a WSLS and B an ALLC, if they cooperated in the previous stage they’ll keep cooperating. However, if A makes a mistake and defects, it will keep defecting against the much too nice B:

A: CCCDDDD…

B: CCCCCCC…

In short, WSLS exploits ALLC’s goodness while cooperating with other WSLS’s.

When a WSLS meets an ALLD, the ALLD is better off, as we have

WSLS: CDCD….

ALLD: DDDD…

Note that ALLD averages (P+T)/2 per game. Hence, as long as R>(P+T)/2 , the WSLS playing each other will average more than an ALLD playing a WSLS. In other words,

E(WSLS,WSLS)>E(ALLD,WSLS),

a strict Nash that comes close to delivering an ESS strategy. (It just comes close because the population is finite and random events, random drift for example, are allowed). In other words, one (or just a few) ALLD mutant will not invade. If R≤(P+T)/2 then a stochastic variant of WSLS that cooperates after mutual defection only with probability less than 1 will take over. Occasionally, the system cycles back to ALLD, but the mechanism is unclear.

The limits of replicator dynamics

Replicator dynamics has some features that limit its application. Here we consider two:

· The dynamics we considered has no mutation, in the sense that every replicator produces an identical copy of itself. A side consequence of this is that if a strategy has disappeared, it will never reappear again. (By the way, this is why p=1 and p=0 are always fixed points, even when they are not stable). There are ways of dealing with this. For example, one could modify the replicator equation by introducing a term that makes, say, S turn into H with some probability q>0, the simplest case of mutation; alternatively, one can use Markov chains. However, as this complicates things we shall not do it. Even so, we can note the following. In our Stag Hunt example, p=1/2 is an unstable interior point, which means that random mutations will move the system in either of the two attraction basins with equal probability. However, if we change the payoff matrix to (5,5) when S meets another S, the interior point becomes p=1/4 (check it out!), which means that, given initial uniform distribution, on average random mutations will push systems more often in the basin of attraction with p=1 as the attractor; hence, on average systems will spend most of their lives in that basin. Of course, if there are n individuals in the population, there is a probability qⁿ that in the transition from one generation to another, all the S’s (or enough of them) become H, thus bringing H to fixation; however, even for relatively small n, that probability is likely to be really negligible. For example, if q=10% and n=16, the resulting probability is 10^-17, which is really phenomenally low, as 10¹⁷ is roughly the age of the universe in seconds.

· The population must be infinite (in practice, very large) to avoid random drift; however, many populations are relatively small and random drift is unavoidable. Consequently, if we are modeling a population that is not very large we need to use a different, and more complex, procedure than replicator dynamics. For more, see Nowak’s book.

Spatial Games

Replicator dynamics assumes random interactions among strategies. But this, as we noted, is unrealistic in many contexts because segmentation occurs. So, in Stag Hunt we can then think of groups of S’s in an S-structure interacting with H’s grouped together in an H structure. Then things can be dramatically different from random interaction situations.

Of course, the system’s behavior will depend (also) on the reproduction rules; types of evolutionary dynamics different from replicator dynamics are obvious. For example, a simple dynamics could be:

Every individual looks at its payoffs and at those of its immediate neighbors, and in the next round all individuals, or a certain percentage of them, simultaneously adopt the strategy that produced the highest payoff.

Note that this rule is relatively plausible if replicators are cultural items (memes), as people do tend to adopt practices that are more successful than theirs. (Of course, this is a simplification, as cultural inertia and conformism play an important role; however, these can be modeled as well, even if the model becomes complicated). The way to make these ideas more precise is to look at spatial games.

Consider a spatial grid in which each individual occupies a position and interacts with all of its neighbors. The payoffs of each interaction are summed and in the next round:

each player adopts the strategy with the highest payoff in the neighborhood
all the updating occurs at the same time, which means that generations are not overlapping


	D1	D2	D3	D4
	D12	C1	C2	D5
D13	D11	C3	C4	D6
	D10	D9	D8	D7

Consider a Stag Hunt game where E(S,S)=3, E(S,H)=0, E(H,S)=2, and E(H,H)=1. Here we have 4 cooperators (stags) surrounded by 12 defectors (hares). We may imagine this grid as a small part of a larger one that is wrapped around in the shape of a doughnut so that there are no boundary effects. A cell’s neighborhood is the von Neumann neighborhood, constituted by the 4 cells sharing a side with it. For example, C3’s neighborhood is constituted by D11, C1, C4, D9. Hence, the fate of C3 depends on its strategy, those of its neighbors D11, C1, C4, D9, and those of its neighbors’ neighbors. Let us look at C3’s fate. It will obtain a payoff of 6 from cooperating with C1 and C4 and a payoff of 0 from attempting, and failing, to cooperate with D9 and D11. In short, its payoff will be 6. The same is true for the remaining 3 cooperating cells. Consider now D11. It will have a payoff of 3 from its interaction with D13, D12, and D10, and a payoff of 2 from its interaction with C3, for a total of 5. The same is true for the remaining defecting cells. Hence, in the next round the twelve defecting cells will turn into cooperators, and the cooperator square will expand, eventually taking over. Note that cooperation is more successful in this spatial game than under replicator dynamics.

A standard way to study an evolutionary game is to consider the conditions for invasion. So, imagine that cooperators have taken over and that one mutates into a defector (hare). Its payoff will be 8, while that of each of its cooperating neighbors will be 9, which means that the defector will vanish in the next round. Two neighboring defectors will have a payoff of 7, while their neighboring cooperators will get 9; hence, the defectors will vanish. With 3 neighboring defectors, the defectors bordering with three cooperators get a payoff of 7, against one of 9 for the cooperators. More defectors will fare even worse. So, a community of cooperators is immune from invasion from defectors, not a surprise, since Stag is an ESS in replicator dynamics.

Imagine now a community of defectors with one mutant cooperator. The cooperator will have a payoff of 0 while each defecting neighbor will have one of 5, with the result that the cooperator trait will vanish. Two neighboring cooperators will obtain each a payoff of 3, while each defecting neighbor will get one of 5; consequently, the cooperator trait will not survive. Three neighboring cooperators will evolve into a group of four cooperators with the shape of a latin cross (“+”), with the central cooperator getting 9; this group will hold its own. As we saw, a square of four cooperators will take over. In short, in the spatial version of Stag Hunt we considered, cooperation is much more successful than defection when compared with replicator dynamics.

In addition to von Neumann neighborhoods, one can use Moore neighborhoods, in which a cell’s neighbors are those touching it, so that the neighbors are all those reachable from the cell by a king move in chess. Grids may become n-dimensional, or, given an appropriate metric, one might define a distance r from a cell such that only and all the cells within r are neighbors, with the result that the number of neighbors may vary from cell to cell. Analogously, the transition rules may be changed. For example, a cell might become a cooperator (defector) if the average payoff of all the neighborhood cooperators (defectors) is higher than that of the neighborhood defectors (cooperators). Or, the probability of change might be linked to the payoff of the most successful neighbors. Analogously, one can adopt a randomly asynchronous updating: a cell is chosen at random, the relevant payoffs determined and then only that cell is updated, thus mimicking overlapping generations. There is, then a great amount of very different dynamics producing very different types of evolutions.