The probability of winning a game, a set, and a match in tennis are computed,
based on each players probability of winning a point on serve, which we
assume are independent identically distributed (iid) random variables. Both
two out of three and three out of five set matches are considered, allowing a
13-point tiebreaker in each set, if necessary. As a by-product of these formulas,
we give an explicit proof that the probability of winning a set, and hence a
match, is independent of which player serves first. Then, the probability of each
player winning a 128-player tournament is calculated. Data from the 2002 U.S.
Open and Wimbledon tournaments are used both to validate the theory as well
as to show how predictions can be made regarding the ultimate tournament
champion. We finish with a brief discussion of evidence for non-iid effects in
tennis, and indicate how one could extend the current theory to incorporate
such features.
1. Introduction
We wish to calculate the probability that one player, A, wins a tennis match
against another player B. It is not enough to know the rankings of A and B,
because there is no unambiguous way to translate rankings into probabilities
of winning [1, 2]. However, it does suffice to know the probability pAR that A
Address for correspondence: Paul K. Newton, Department of Aerospace and Mechanical Engineering
and Department of Mathematics, University of Southern California, Los Angeles, CA 90089-1191;
e-mail: newton@spock.usc.edu
241
STUDIES IN APPLIED MATHEMATICS 114:241269
2005 by the Massachusetts Institute of Technology
Published by Blackwell Publishing, 350 Main Street, Malden, MA 02148, USA, and 9600 Garsington
Road, Oxford, OX4 2DQ, UK.
C
242
wins a rally when A serves, and the probability pBR that B wins a rally when
B serves. Such probabilities have been used to calculate the probability of
winning a game in other racquet sports, such as racquetball [3], squash [4],
and badminton [5]. Models of this type for tennis were first considered by Hsi
and Burych [6], followed by Carter and Crews [7], and Pollard [8]. All of
these analyses, including ours, treat points in tennis as independent identically
distributed (iid) random variables, hence pAR and pBR are taken as constant
throughout a match. A recent statistical analysis of 4 years of Wimbledon data
[9] shows that although points in tennis are not iid, for most purposes this
is not a bad assumption as the divergence from iid is small. Other aspects
of tennis that have been analyzed using probabilistic models include optimal
serving strategies [10], the efficiency of various scoring systems [11], and the
question of which is the most important point [12]. Statistical methods have
also been used to study the effects of new balls [13], service dominance [14],
and the probabilities of winning the final set of a match [15].
Our formulation unifies and extends some of the previous treatments by the
use of hierarchical recurrence relations whose solutions yield the probability
that A wins a game, a set, or a match against B in terms of pAR and pBR . We then
calculate the probability that a player in a 128 player single elimination
tournament reaches the second, third, . . . , or final round, and the probability
that a player who has reached the nth round will win the tournament. We also
provide an explicit proof, based on the solutions of our recurrence relations,
that the probability of winning a set or match does not depend on which player
serves first.
Of course the probability pAR that A wins a rally on serve depends upon the
opponent B as well as upon A. If data are not available for A serving to B, then
data for A playing against players similar to B can be used. We illustrate
this point with data from the 2002 U.S. Open Mens and Womens Singles
Tournaments, and from the 2002 Wimbledon Mens and Womens Singles
Tournaments. The data, shown in Tables 1 and 2, and in Figure 1, agree well
with our theoretical calculation of pAG , the probability that A wins a game
when A serves. In a companion paper (part II), we will compare the theory
with Monte Carlo simulations.
A game in tennis is played with one player serving. The game is won by the
first player to score four or more points and to be at least two points ahead of
the other player. In a set, the players serve alternate games until a player wins at
least six games and is ahead by at least two games. If the game score reaches
66, a 13-point tiebreaker is used to determine who wins the set, with the player
who started serving the set serving the first point of the tiebreaker.1 Then, the
1 In
the U.S. Open, a tiebreaker is used in every set, whereas in Wimbledon, in the French Open, and in
the Australian Open, tiebreakers are not used in the third set of a two out of three set match (womens
format), or the fifth set in a three out of five set match (mens format).
243
Table 1
Data for the Semifinalists in the 2002 U.S. Open Tournament
Player
Women
S. Williams
V. Williams
L. Davenport
A. Mauresmo
Men
P. Sampras
A. Agassi
L. Hewitt
S. Schalken
240
270
206
287
349
428
301
457
52
56
45
58
57
70
53
75
0.69
0.63
0.68
0.63
0.91
0.8
0.85
0.77
0.89
0.79
0.88
0.79
573
443
436
519
781
676
654
768
124
96
91
107
130
110
107
119
0.73
0.66
0.67
0.68
0.95
0.87
0.85
0.9
0.93
0.85
0.86
0.88
276
273
252
241
390
352
427
378
57
51
48
50
64
62
66
57
0.71
0.67
0.59
0.64
0.89
0.82
0.73
0.88
0.91
0.86
0.71
0.81
450
516
457
483
646
847
683
721
96
94
92
101
107
128
110
114
0.70
0.61
0.67
0.67
0.90
0.73
0.84
0.89
0.90
0.76
0.86
0.86
244
0.9
0.8
0.7
0.6
pG
0.5
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pR
A
Figure 1. The probability pAG of A winning a game when A serves, i.e., of holding
serve, as a function of pAR based on (6). The open circles correspond to data from eight
semifinalists in the 2002 U.S. Open Mens and Womens Singles Tournaments and the open
stars correspond to data from eight semifinalists in the 2002 Wimbledon Mens and Womens
Singles Tournaments. The four left most data points represent the combined data from the
semifinalists first round opponents in each tournament.
players alternate serves, each serving two consecutive points, until someone
wins at least seven points, and is ahead by at least two points. The winner of
the tiebreaker wins the set with seven games to the opponents six games. To
win a match, a player must win two out of three sets (womens format), or win
three out of five sets (mens format), with the two players serving alternate
games throughout the match. The initial server in the match is determined by a
coin toss, with the winner given the choice of serving first or receiving first.
2. Probability of winning a game
Player A can win a game against player B by a score of (4, 0), (4, 1) or (4,
2), or else the score can become (3, 3), called deuce. Then, A can win by
245
2
p GA (4, j) + p GA (3, 3)
p ADG (n + 2, n).
(1)
n=0
j=0
Here, pDG
A (n + 2, n) is the probability that A wins by scoring n + 2 while B
scores n after deuce has been reached, with A serving. It is given by
p ADG (n + 2, n) =
n
j=0
= p AR
p AR q AR
2
j R R n j
qA pA
R 2
n!
p
j!(n j)! A
n
p AR q AR 2n .
(2)
Upon using (2) in (1), and summing the geometric series, we get
p GA =
2
1
2
p GA (4, j) + p GA (3, 3) p AR 1 2 p AR q AR .
(3)
j=0
4
p GA (4, 1) = 4 p AR q AR ,
p GA (4, 2) =
5 4 R 4 R 2
pA qA ,
2
6! R R 3
p q .
(3!)2 A A
(4)
Now using (4) in (3) gives the probability that A wins a game when A serves,
i.e., that A holds serve:
3 2
1
4
2
p GA = p AR 1 + 4q AR + 10 q AR + 20 p AR q AR p AR 1 2 p AR q AR .
(5)
This equation agrees with that given in [7]. Figure 1 shows pAG as a function of
pAR , based upon (5). The open circles in the figure are data for the semifinalists
in the 2002 U.S. Open Mens and Womens Singles Tournaments, shown in
Table 1, while the stars are data for the semifinalists in the 2002 Wimbledon
Mens and Womens Singles Tournaments shown in Table 2. The left most four
points are totals for their first round opponents in both tournaments. They all
lie close to the theoretical curve.
246
Let
denote the probability that player A wins a set against player B, with A
serving first, and qAS = 1 pAS . To calculate pAS in terms of pAG and pBG , we
define pAS (i, j) as the probability that in a set, the score becomes i games for
A, j games for B, with A serving initially. Then,
p SA =
4
(6)
j=0
Here, pAT is the probability that A wins a 13-point tiebreaker with A serving
initially, and qAT = 1 pAT .
To calculate pAS (i, j), needed in (6), we use the following recursion formulas
and initial conditions:
For 0 i, j 6:
if i 1 + j is even: p SA (i, j) = p SA (i 1, j) p GA + p SA (i, j 1)q AG
omit i 1 term if j = 6, i 5;
omit j 1 term if i = 6, j 5
(7)
(8)
Initial conditions:
p SA (0, 0) = 1;
p SA (i, j) = 0
if i < 0, or j < 0.
(9)
p SA (5, 7) = p SA (5, 6) p GB .
(10)
5
j=0
p TA (7,
j) +
p TA (6, 6)
n=0
p TA (n + 2, n).
(11)
247
Because the sequence of serves in a tiebreaker is A, BB, AA, BB, etc., we have
p TA (n + 2, n) =
n
p AR p BR
j R R n j
qA qB
j=0
= p AR p BR + q AR q BR
n
n!
pRq R
j!(n j)! A B
p AR q BR .
(12)
5
1
p TA (7, j) + p TA (6, 6) p AR q BR 1 p AR p BR q AR q BR
(13)
j=0
(14)
if i 1 + j = 1, 2, 5, 6, . . . , 4n + 1, 4n + 2, . . .
p TA (i, j) = p TA (i 1, j)q BR + p TA (i, j 1) p BR
omit j 1 term if i = 7, j 6
omit i 1 term if j = 7, i 6
(15)
Initial conditions:
p TA (0, 0) = 1;
p TA (i, j) = 0
if i < 0, or j < 0.
(16)
248
p R = .50 +
- .01
B
p R = .60 +
- .01
0.9
p R = .70 +
- .01
B
0.8
= .9
B
pR
= .8
B
pR
pR
= .7
= .6
B
pR
pR
= .4
pR
pR
= .3
= .2
B
0.5
pR
pR
pS
= .1
0.6
= .5
0.7
0.4
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pR
A
Figure 2. The probability pAS of player A winning a set plotted as a function of pAR for
various values of pBR . Compiled data from the 2002 U.S. Open Mens Singles event are shown
for the values pBR = 0.50 0.01, pBR = 0.60 0.01, and pBR = 0.70 0.01.
curve marked pBR = 0.50 represents three matches, each of the seven data points
associated with the curve marked pBR = 0.60 represents approximately five
matches, while each of the three data points associated with the curve marked
pBR = 0.70 represents a compilation of approximately seven matches. Given
the relatively small number of sets underlying each of the data points, the
data fits the theoretical curves reasonably well. Figure 3 shows the probability
of player A winning a tiebreaker against player B plotted as a function of
pAR [0, 1] for the full range values of pBR in increments of 0.1.
3.3. Serving or receiving first
In this section, we prove that there is no theoretical advantage to serving first by
showing that the probability of player A winning the set when serving first, pAS , is
equal to his probability of winning the set when receiving first, qBS . For this, we
need formula (6) for pAS , along with the corresponding formula for qBS given by
249
0.9
0.8
0.7
0.6
= .9
B
pR
pR
= .8
= .7
B
pR
= .6
B
pR
= .5
B
pR
= .4
B
pR
pR
= .2
= .1
pR
0.4
= .3
0.5
pR
pT
0.3
0.2
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pR
A
Figure 3. The probability pAT of player A winning a tiebreaker plotted as a function of pAR for
various values of pBR .
q BS =
4
(17)
j=0
We obtain the terms pBS ( j, i) in (17) from pAS (i, j) given in the Appendix, by
interchanging pAG qBG , pBG qAG . From (A.1) and (A.6) it is immediate that
p SA (6, 0) = p BS (0, 6)
(18)
(19)
(20)
p SA (6,
j) =
4
j=1
p BS ( j, 6)
(21)
250
and that
p TA = q BT .
(22)
(23)
and
p SA (6, 3) + p SA (6, 4) = p BS (3, 6) + p BS (4, 6).
(24)
pAG ,
qBG
i=0 j=0
1, 6) +
p BS (2n
p BS (2n, 6)
6+2n
6+2n
i j
bijS (n) p GA p GB
(26)
i=0 j=0
for n = 1, 2. Then, it can be shown that the coefficients of each are equal,
i.e., aijS (n) = bijS (n). The values are listed in the Appendix. Figure 4 shows
the probability of obtaining each of the scores that are independent of which
player serves first for the case of evenly matched players.
To prove that pAT = qBT , we use the formula (11) for pAT and the corresponding
one for qBT
q BT
5
p TB (
j, 7) +
p TB (6, 6)
p TB (n, n + 2).
(27)
n=0
j=0
We obtain the terms pBT ( j, i) in (27) from pAT (i, j) given in the Appendix, by
interchanging pAR qBR , pBR qAR . From (A.14) it is clear that pAT (6, 6) =
pBT (6, 6). Furthermore, from the symmetry under exchanging pAR qBR , pBR
qAR in (12), we have that
p TA (n + 2, n) = p TB (n, n + 2).
(28)
p TA (7, j) =
5
p TB ( j, 7).
(29)
j=0
(30)
(31)
251
0.9
0.8
0.7
0.6
pR
B
0.5
(e)
0.4
0.3
(d)
0.2
0.1
(c)
(b)
(a)
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pR
A
Figure 4. Set scores that are independent of which player serves first, plotted for two equal
players pAR = pBR . (a) pAS (6, 0), (b) pAS (6, 1) + pAS (6, 2), (c) pAS (6, 3) + pAS (6, 4), (d) pAS (7, 5),
and (e) pAS (6, 6).
(32)
4
4
i j
aijT (n) p AR p BR
(33)
i j
bijT (n) p AR p BR
(34)
i=0 j=0
p TB (2n, 7) + p TB (2n + 1, 7) =
4
4
i=0 j=0
for n = 0, 1, 2. Then, it can be shown that the coefficients are equal, i.e.,
aijT (n) = bijT (n). The values are listed in the Appendix. Figure 5 shows the
probability of obtaining each of the tiebreaker scores that are independent of
which player serves first, for equally matched players.
252
0.9
0.8
0.7
0.6
pR
B
0.5
0.4
0.3
(d)
0.2
(c)
(b)
0.1
(a)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pR
A
Figure 5. Tiebreaker scores that are independent of which player serves first, plotted for two
equal players pAR = pBR . (a) pAT (7, 0) + pAT (7, 1), (b) pAT (7, 2) + pAT (7, 3), (c) pAT (7, 4) +
pAT (7, 5), and (d) pAT (6, 6).
The question of whether to serve or receive first has received some attention
in the literature. In an interesting combinatorial analysis of Kingston [16]
(followed by a note [17]), a simplified scoring system (which he calls a short
set) is considered in which player A serves the first game of a match consisting
of the best N of 2N 1 games. His striking result is that it does not matter
whether the rules are such that the players alternate serves after each game, or
whether the winner of the previous game continues to serve the next game.
In either case, player A has the same probability of winning. At the end of
the article, he asks how many games need to be played to give two equal
players a reasonably equal chance of winning, whoever starts serving. As a
consequence of the central limit theorem, player As (approximate) probability
of winning a short set is 12 + 12 ( p AR 12 )[ p AR (1 p AR )(N 1)]1/2 . Figure 2
in his paper shows the slow convergence to 12 as N , giving player A a
distinct advantage, for finite N, if he serves first and pAR > 0.5. Thus, for best
N of 2N 1 scoring, there is a theoretical advantage to serving first. For
253
tennis scoring, the paper of Pollard [8] considers both classical scoring (no
tiebreakers) and tiebreaker scoring, and implicit in his calculations (see, for
example, his Tables 2 and 3) is the fact that pAS = qBS , although the result is not
proven. There are other ways of proving and generalizing the result that do not
rely on the explicit solutions for pAS and qBS as our proof does. In fact, one can
prove that as long as the scoring system is such that the number of games
served by player A minus the number of games served by player B is 1, 0,
or 1, there is no advantage or disadvantage to serving first. Such scoring
systems are termed service neutral and are discussed in [18].
Similarly when A serves the first game and B serves the last game, the total
number of games is even. For even numbers of games, the right side of (6) yields
S
=
p SA (6, j) + p SA (7, 5).
(36)
pAB
j=0,2,4
(37)
(38)
j=1,3
S
=
qAB
j=0,2,4
p SA ( j, 6) + p SA (5, 7).
(39)
1.0000
0.5000
0.0940
0.0045
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.0000
0.9060
0.5000
0.0864
0.0019
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.2
1.0000
0.9956
0.9136
0.5000
0.0621
0.0007
0.0000
0.0000
0.0000
0.0000
0.0000
0.3
1.0000
1.0000
0.9981
0.9380
0.5000
0.0487
0.0005
0.0000
0.0000
0.0000
0.0000
0.4
1.0000
1.0000
1.0000
0.9993
0.9513
0.5000
0.0487
0.0007
0.0000
0.0000
0.0000
0.5
1.0000
1.0000
1.0000
1.0000
0.9995
0.9513
0.5000
0.0621
0.0019
0.0000
0.0000
0.6
Values of pAR are along the top row and values of pBR are down the left column.
PBR
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.1
0.0
pAR
1.0000
1.0000
1.0000
1.0000
1.0000
0.9993
0.9380
0.5000
0.0864
0.0045
0.0000
0.7
0.8
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.9981
0.9136
0.5000
0.0940
0.0000
Table 3
Probability pM
of
Player
A
Winning
a Match of Three Sets out of Five
A
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.9956
0.9060
0.5000
0.0000
0.9
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0
254
P. K. Newton and J. B. Keller
255
Then,
S
S
q AS = qAA
+ qAB
.
(40)
To get pSBA , pSBB , qSBA , and qSBB , we interchange A and B in (35)(40). Note that
because pAS + qAS = 1 and pBS + qBS = 1, we have
S
S
S
S
pAA
+ qAA
+ pAB
+ qAB
= 1,
(41)
S
S
S
S
+ qBB
+ pBA
+ qBA
= 1.
pBB
(42)
pM
AA (i,
M
S
M
S
M
pAB
(i, j) = pAB
(i 1, j) pAB
+ pAA
(i 1, j)qBB
M
S
M
S
+ pAB
(i, j 1)qAB
+ pAA
(i, j 1) pBB
,
(43)
M
S
M
S
M
(i, j) = pAB
(i 1, j) pAA
+ pAA
(i 1, j)qBA
pAA
M
S
M
S
+ pAB
(i, j 1)qAA
+ pAA
(i, j 1) pBA
.
(44)
M
pAA
(i, j) = 0
if i < 0 or j < 0
(45)
M
(0, 0) = 1;
pAB
M
pAB
(i, j) = 0
if i < 0 or j < 0
(46)
M
S
pAB
(1, 0) = pAB
;
M
S
pAB
(0, 1) = qAB
;
M
S
pAA
(1, 0) = pAA
;
M
S
pAA
(0, 1) = qAA
.
(47)
For the mens format of three sets out of five, (43)(47) must be solved for
i, j = 0, 1, 2, 3. When j = 3, the i 1 terms must be omitted; when i = 3,
the j 1 terms must be omitted. The probability that player A wins a three
out of five set match when serving first is given by
p AM =
2
M
M
(3, j) + pAB
(3, j) .
pAA
(48)
j=0
For a match of two sets out of three, (35) and (36) must be solved for i,
j = 0, 1, 2. When j = 2, the i 1 terms must be omitted; when i = 2, the
j 1 terms must be omitted. Then, the probability that player A wins a two
out of three set match when serving first is
p AM =
1
j=0
M
M
pAA
(2, j) + pAB
(2, j) .
(49)
256
M
By using the solutions of (43) and (44) for pM
AA (2, j) and pAB (2, j) and
taking advantage of (37) and (40), we can write (49) as
S
S
S
S
S
S
S
S
q BS + pAB
p SA + pAA
pBA
q BS + pAA
pBB
p SA + pAB
qAA
q BS
p AM = pAA
S
S
S
S S
S
S
S
S
S
S
+ pAB
qAB
p SA + qAA
qBA
q B + qAA
qBB
p SA + qAB
pAA
q BS + qAB
pAB
p SA .
(50)
Note that because the probability of winning a set is independent of which
player serves first, the above formula (50) reduces to
2
2
p AM = p SA + 2 p SA p BS
(51)
for the two out of three set format, and
3
3
3 2
p AM = p SA + 3 p SA p BS + 6 p SA p BS
(52)
(n)
(53)
p = p3(n) .
.
.
.
(n)
p128
(n)
Here, pi is the conditional probability that player i wins a match in the nth
round, provided that he or she survives to that round of the tournament. From
(48) or (49), we know pM
ij , the probability that player i beats player j, which we
write more simply as P ij .
1.0000
0.5000
0.1461
0.0180
0.0005
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
0.0000
1.0000
0.8539
0.5000
0.1376
0.0103
0.0001
0.0000
0.0000
0.0000
0.0000
0.0000
0.2
1.0000
0.9820
0.8624
0.5000
0.1091
0.0053
0.0000
0.0000
0.0000
0.0000
0.0000
0.3
1.0000
0.9995
0.9898
0.8909
0.5000
0.0922
0.0039
0.0000
0.0000
0.0000
0.0000
0.4
1.0000
1.0000
0.9999
0.9947
0.9079
0.5000
0.0922
0.0053
0.0001
0.0000
0.0000
0.5
pAR
1.0000
1.0000
1.0000
1.0000
0.9961
0.9079
0.5000
0.1091
0.0103
0.0005
0.0000
0.6
1.0000
1.0000
1.0000
1.0000
1.0000
0.9947
0.8909
0.5000
0.1376
0.0180
0.0000
0.7
0.8
1.0000
1.0000
1.0000
1.0000
1.0000
0.9999
0.9898
0.8624
0.5000
0.1461
0.0000
Table 4
of Player A Winning a Match of Two Sets out of Three
Values of pAR are along the top row and values of pBR are down the left column.
PBR
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0.1
0.0
Probability
pM
A
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
0.9995
0.9820
0.8539
0.5000
0.0000
0.9
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0000
1.0
258
p R = .60 +
- .01
0.9
0.8
Game data
0.7
Set data
Match data
0.6
0.5
0.4
0.3
pG
0.2
pM
pS
0.1
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pR
A
Figure 6. Theoretical curves for pAG (dotted), pAS (dashed), and pM
A (solid) corresponding to
values pBR = 0.60. Compiled data from the 2002 U.S. Open Mens Singles event are shown for
all matches in which pBR = 0.60 0.01.
(n = 1, . . . , 6).
(54)
Here, Pn is a 128 128 matrix with block diagonal structure made up of 27n
(k)
blocks. We label them Pn , 1 k 27n , and then P n is given by
(1)
0
0 ...
0
Pn
0 P(2)
0 ...
0
n
.
(3)
.. .
Pn =
(55)
...
0 Pn
0
..
..
..
..
..
.
.
.
.
...
(27n )
Pn
259
(k)
(56)
(n,k)
P,+12n1
...
P,1
P,
P
...
P+1,1
P+1,
+1,+12n1
(n,k)
.
P, =
(57)
..
..
..
..
.
.
.
.
(2)
0 P1
0
(3)
P1 =
0 P1
0
.
..
..
.
.
.
.
0
...
0
..
.
..
.
...
..
.
(58)
(64)
. . . P1
(k)
P1 is a 2 2 matrix:
(k)
P1 =
P2k,2k1
P2k1,2k
.
0
P34
0
(64)
, . . . , P1 =
0
P128,127
(59)
P127,128
.
0
(60)
7
(n)
TC
p
p1
n=1 1
7
TC
p2 n=1 p2(n)
7
(n)
pT C
TC
p 3 =
(61)
n=1 p3 .
..
.
.
.
TC
(n)
7
p128
n=1 p128
260
The factors in the last column are obtained by solving (54). Note that the
components of the vector pTC must sum to unity.
5.2. Predicting the fate of the semifinalists
Suppose that after the quarterfinal round, we wish to predict the probability of
each of the four semifinalists becoming the tournament champion. We use
the preceding recursion method, introducing the vectors p(0) , p(1) , and p(2) of
probabilities of winning the quarterfinal, semifinal, and final round
(n)
p1
1
1
p (n)
2
(0)
(n)
p = , p = (n) , (n = 1, 2).
(62)
p
1
3
(n)
1
p4
The matrices P1 and P2 are given by
0
P12
P
21 0
P1 =
0
0
0
0
0
0
P2 =
P31
P41
0
0
P32
P42
0
0
0
P43
0
0
,
P34
0
(63)
P13
P23
0
0
P14
P24
.
0
0
(64)
The probability that player i wins a semifinal match is the ith component of
P12
P
21
p(1) = P1 p(0) = .
(65)
P34
P43
The probability that player i wins the final match if he or she plays in it is the
ith component of
261
j=1
We use this with n = 5 in (5) for each player in the semifinals and then
compute their empirical probabilities of winning a match against any of the
other remaining players. This allows us to compute the entries of the matrices
P1 and P2 in (63), (64), and arrive at values for pTC in round n = 6 for each of
the four semifinalists. To calculate pTC for the two finalists after the semifinal
round match, we repeat the same steps for the two finalists, using (68) with
n = 6. The same method of calculating pTC could be applied after round n = 1,
and after each subsequent round as the tournament progresses to make running
projections regarding tournament outcomes. Other forecasting methods which
allow point by point updates as the match unfolds are described in [19].
6.1. Womens Tennis Association (WTA) data
Figure 7 shows the 2002 U.S. Open Womens Singles Draw from the semifinal
round. Under each player, we show the value of piR (5), piR (6), and piR (7).
Next to each players name is their empirical probability of winning the
upcoming match, P ij , as well as their empirical probability of becoming
the tournament champion, pTC
i . After the quarterfinal round matches, L.
Davenport would have been the slight favorite to win the tournament ( p TC
2 =
TC
0.3599), followed by V. Williams ( p TC
=
0.3047),
S.
Williams
(
p
=
4
1
0.2872) and A. Mauresmo ( p TC
=
0.0482),
while
after
the
semifinal
round
3
262
S. Williams
pTC = 1
1
pTC
4
= .3047
Figure 7. The probability P ij of each of the four semifinalists in the 2002 U.S. Open
Womens Singles tournament winning her match, and her probability pTC
of becoming the
i
tournament champion.
matches, S. Williams ( p TC
1 = 0.6527) was the favorite and ultimately won the
tournament. Figure 8 shows the 2002 Wimbledon Womens Singles Draw from
the semifinal round. Here, V. Williams ( p TC
1 = 0.4784) was the favorite to win
the tournament after the quarterfinal round match, followed by S. Williams
TC
TC
( p TC
4 = 0.3834), A. Mauresmo ( p 3 = 0.1233), and J. Henin ( p 2 = 0.0150),
TC
while S. Williams ( p 4 = 0.5866) was the favorite after the semifinal round
match and ultimately won the tournament.
6.2. Association of Tennis Professionals (ATP) data
Figure 9 shows the 2002 U.S. Open Mens Singles Draw. After the quarterfinal
round matches, P. Sampras was the heavy favorite to win the tournament
TC
TC
( p TC
1 = 0.6747), followed by L. Hewitt ( p 4 = 0.1457), A. Agassi ( p 3 =
0.0945), and S. Schalken ( p TC
2 = 0.0851). Sampras chances of winning the
tournament increased after his semifinal round match ( p TC
1 = 0.8856) and
he ultimately won the tournament. Figure 10 shows the results from the
2002 Wimbledon Mens Singles event. After their quarterfinal round matches,
X. Malisse ( p TC
3 = 0.4573) was favored to win the tournament, followed by
TC
L. Hewitt ( p TC
1 = 0.3364), T. Henman ( p 2 = 0.1815), and D. Nalbandian
TC
( p 4 = 0.0247). After the semifinal round matches, it was L. Hewitt, the
ultimate tournament champion, who was the heavy favorite ( p TC
1 = 0.8698).
263
J. Henin
S. Williams
pTC = 1
4
TC
= .3834
Figure 8. The probability P ij of each of the four semifinalists in the 2002 Wimbledon
Womens Singles tournament winning her match, and her probability pTC
of becoming the
i
tournament champion.
P. Sampras
P. Sampras
P. Sampras
pTC = 1
1
A. Agassi
A. Agassi
L. Hewitt
Figure 9. The probability P ij of each of the four semifinalists in the 2002 U.S. Open Mens
Singles tournament winning his match, and his probability pTC
of becoming the tournament
i
champion.
264
L. Hewitt
L. Hewitt
T. Henman
TC
L. Hewitt
pTC = 1
1
D. Nalbandian
P43 = .1460
pTC
4
= .0247
Figure 10. The probability P ij of each of the four semifinalists in the 2002 Wimbledon Mens
Singles tournament winning his match, and his probability pTC
of becoming the tournament
i
champion.
265
A more refined analysis than the one described in this paper could incorporate
these and other higher-order effects by allowing pAR and pBR to vary from point
to point as the match unfolds, depending on the points importance [12]
or by taking into consideration more detailed player characteristics such as
rallying ability or strength of return of serve. For example, we could define the
probability that player A wins a point on serve as
R
p AR = p AR + pAB
(i, j),
0 p AR 1
(69)
where pAR is constant throughout the match, pRAB (i, j) represents player As
probability of winning a point on serve against player B, when the score is i
points for A and j points for B, and
1 is a small parameter reflecting
the fact that, in most cases, the deviation from iid is small. The goal then
would be to calculate the corresponding formulas for game, set, and match for
each player, i.e., p GA , p SA , p AM , and p GB , p BS , p BM . The leading-order theory
( = 0) is the one described in this paper based on the iid assumption, while
higher-order corrections could be treated perturbatively.
Acknowledgments
This work is supported by the National Science Foundation grants NSF-DMS
9800797 and NSF-DMS 0203581. Useful comments and observations by
J. DAngelo and G.H. Pollard on an early draft of the manuscript are gratefully
acknowledged. The first author also thanks Andres Figueroa for skillfully
performing Matlab calculations on the models developed in this manuscript as
part of a summer undergraduate research project.
Appendix
The solution of (7)(10) is
3
p SA (6, 0) = p GA q BG
3 3
4 2
p SA (6, 1) = 3 p GA q AG q BG + 3 p GA p GB q BG
3
3
2 2 4
p SA (6, 2) = 12 p GA q AG p GB q BG + 6 p GA q AG q BG
4 2 2
+ 3 p GA p GB q BG
3 2 3
4 2 2
p SA (6, 3) = 24 p GA q AG p GB q BG + 24 p GA q AG p GB q BG
2 3 4
5 3
+ 4 p GA q AG q BG + 4 p GA p GB q BG
(A.1)
(A.2)
(A.3)
(A.4)
266
3 2 2 3
2 3 4
p SA (6, 4) = 60 p GA q AG p GB q BG + 40 p GA q AG p GB q BG
4 3 2
4 5
+ 20 p GA q AG p GB q BG + 5 p GA q AG q BG
5 4
+ p GA p GB q BG
(A.5)
3
3
2
4
4
2
3
3
p SA (7, 5) = 100 p GA q AG p GB q BG + 100 p GA q AG p GB q BG
2 4 5
5 4 2
+ 25 p GA q AG p GB q BG + 25 p GA q AG p GB q BG
5 6 6 5
+ p GA q AG q BG + p GA p GB q BG .
(A.6)
To obtain pAS (i, j) from pAS ( j, i), we interchange pAG qAG and pBG qBG in
(A.1)(A.6). Finally, pAS (6, 6) in (6) is given by
4
S
S
S
S
S
p A (i, 6) + p A (6, i) + p A (7, 5) + p A (5, 7) . (A.7)
p A (6, 6) = 1
i=0
(A.8)
(A.9)
(A.10)
(A.11)
(A.12)
(A.13)
To obtain pAT ( j, i) from pAT (i, j), we interchange pAR qAR and pBR qBR in
(A.9)(A.13). Finally, pAT (6, 6) in (13) is given by
5
T
T
T
p A (i, 7) + p A (7, i) .
(A.14)
p A (6, 6) = 1
i=0
267
S
S
a21
(1) = 24, a22
(1) = 36,
S
S
(1) = 9, a31
(1) = 51,
a30
S
(1) = 3,
a40
S
S
a23
(1) = 24, a24
(1) = 6,
S
S
a32
(1) = 99, a33
(1) = 81,
S
S
a41
(1) = 24, a42
(1) = 60,
S
a34
(1) = 24,
S
S
a43
(1) = 60, a44
(1) = 21.
(A.15)
S
(2) = 5,
a10
S
a11
(2) = 25,
S
(2) = 25,
a14
S
a15
(2) = 5,
S
a20
(2) = 16,
S
a21
(2) = 124,
S
a12
(2) = 50,
S
a13
(2) = 50,
S
S
a22
(2) = 336, a23
(2) = 424,
S
S
(2) = 256, a25
(2) = 60
a24
S
a30
(2) = 18,
S
S
a31
(2) = 198, a32
(2) = 696,
S
(2) = 774,
a34
S
a35
(2) = 210
S
(2) = 8,
a40
S
a41
(2) = 124,
S
a33
(2) = 1080,
(A.16)
S
S
a42
(2) = 560, a43
(2) = 1060,
S
S
a44
(2) = 896, a45
(2) = 280
S
(2) = 1,
a50
S
a51
(2) = 25,
S
(2) = 350,
a54
S
a55
(2) = 126.
S
a52
(2) = 150,
S
a53
(2) = 350,
T
T
a31
(1) = 16, a32
(1) = 24,
T
T
a40
(1) = 3, a41
(1) = 16,
T
T
a33
(1) = 16, a34
(1) = 4,
T
T
a42
(1) = 30, a43
(1) = 24,
T
(2) = 10,
a20
T
a21
(2) = 50,
T
a24
(2) = 50,
T
a25
(2) = 10
T
a30
(2) = 24,
T
a31
(2) = 166,
T
a22
(2) = 100,
T
a44
(1) = 7.
(A.17)
T
a23
(2) = 100,
T
T
a32
(2) = 424, a33
(2) = 516,
T
T
a34
(2) = 304, a35
(2) = 70
T
a40
(2) = 18,
T
T
a41
(2) = 166, a42
(2) = 530,
T
(2) = 532,
a44
T
a45
(2) = 140
T
a50
(2) = 4,
T
a51
(2) = 50,
T
T
a54
(2) = 280, a55
(2) = 84.
T
a43
(2) = 774,
T
T
a52
(2) = 200, a53
(2) = 350,
(A.18)
268
T
a10
(3) = 6,
T
a11
(3) = 36,
T
a12
(3) = 90,
T
a14
(3) = 90,
T
a15
(3) = 36,
T
a16
(3) = 6
T
a20
(3) = 25,
T
a21
(3) = 230,
T
a22
(3) = 775,
T
T
a24
(3) = 1175, a25
(3) = 550,
T
a26
(3) = 105
T
a30
(3) = 40,
T
a31
(3) = 510,
T
a32
(3) = 2200,
T
a34
(3) = 4800,
T
T
a35
(3) = 2590, a36
(3) = 560
T
a40
(3) = 30,
T
a41
(3) = 510,
T
a13
(3) = 120,
T
a23
(3) = 1300,
T
a33
(3) = 4500,
T
T
a42
(3) = 2750, a43
(3) = 6750,
T
T
a44
(3) = 8400, a45
(3) = 5180,
T
a46
(3) = 1260
T
a50
(3) = 10,
T
a51
(3) = 230,
T
a52
(3) = 1550,
T
a54
(3) = 6580,
T
T
a55
(3) = 5620, a56
(3) = 1260,
T
a60
(3) = 1,
T
a61
(3) = 36,
T
a62
(3) = 315,
T
T
a64
(3) = 1890, a65
(3) = 1512,
T
a53
(3) = 4550,
T
a63
(3) = 1120,
T
a66
(3) = 462.
(A.19)
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
19.
20.
21.
269
C. MORRIS, The most important points in tennis, in Optimal Strategies in Sport (S. P.
Ladany and R. E. Machol, Eds.), pp. 131140, Amsterdam; North-Holland, 1977.
J. R. MAGNUS and F. J. G. M. KLAASSEN, The effect of new balls in tennis: Four years at
Wimbledon, The Statistician 48:239246 (1999).
F. J. G. M. KLAASSEN and J. R. MAGNUS, How to reduce the service dominance in
tennis? Empirical results from four years at Wimbledon, preprint, 2003.
J. R. MAGNUS and F. J. G. M. KLAASSEN, The final set in a tennis match: Four years at
Wimbledon, J. Appl. Stat. 26(4):461468 (1999).
J. G. KINGSTON, Comparison of scoring systems in two-sided competitions, J. Comb.
Theory A 20:357362 (1976).
C. L. ANDERSON, Note on the advantage of first serve, J. Comb. Theory A 23:363 (1977).
P. K. NEWTON and G. H. POLLARD, Service neutral scoring strategies for tennis, in
Proceedings of the Seventh Autralasian Conference on Mathematics and Computers in
Sport, 2004.
F. J. G. M. KLAASSEN and J. R. MAGNUS, Forecasting in tennis, preprint, 2003.
J. R. MAGNUS and F. J. G. M. KLAASSEN, On the advantage of serving first in a tennis
set: Four years at Wimbledon, The Statistician 48:247256 (1999).
D. JACKSON and K. MOSURSKI, Heavy defeats in tennis: Psychological momentum or
random effects, Chance 10:2734 (1997).
UNIVERSITY OF SOUTHERN CALIFORNIA
STANFORD UNIVERSITY
(Received July 21, 2004)