May 18, 2017 - model. The fact is verified using statistics of over 24000 matches from. 2009. .... As a result, 1st and 2nd, 1st to 4th, and 1st to 8t...

130 downloads 35 Views 197KB Size

Match results prediction ability of official ATP singles ranking Eiji Konakaa∗ May 18, 2017

Abstract This paper discusses match result prediction ability of ATP ranking points, which is official ranking points for men’s professional tennis players. The structure of overall ATP World Tour and the ranking point attribution system leads that the ranking point ratio between two players is an essential variable. The match result prediction model is a logistic model. The fact is verified using statistics of over 24000 matches from 2009.

1

Introduction

Match result prediction model has much attention in recent years. The professional tennis is no exception. The paper [Kovalchik(2016)] reviews several prediction models (regression model, pointwise prediction, paired-comparison, and so on), and reports match result prediction ability for each methods. Original prediction models have been extensively reported. For instance; • Utilizing ranking difference [Kuper et al.(2014)Kuper, Sierksma, and Spieksma]. • Google PageRank[Brin and Page(1998)] is applied to tennis [Dingle et al.(2013)Dingle, Knottenbelt, and Spanias]. • Calculate scoring probability during the match [Klaassen and Magnus(2003)]. • Pairwise comparison problem is formulated as matrix-based method [Dahl(2012)]. • Bradly-Terry model [Bradley and Terry(1952)] with effect of court surfaces [McHale and Morton(2011)]. ∗a Meijo

University.

1-501,

Shiogamaguchi,

[email protected]

1

Tenpaku-ku,

Nagoya,

JAPAN.

Surprisingly, there is no report on the statistical properties and the match prediction ability of the official ATP (Association of Tennis Professionals) ranking points. If it is mentioned, the very simple prediction “the player with higher ranking will win” is compared to the one proposed by the authors of each paper. Some authors uses the ranking gap and/or ranking point gap as explanatory variables in regression model to make their prediction model. In these cases, the qualitative meaning o the obtained coefficient and the model itself have not been fully explained. The objective of this paper is to analyze how the official ranking point can explain the match result. The main result of this paper is as follows: • The ATP ranking, the official ranking point for men’s professional tennis, can predict match result probability via logistic function with a ranking point ratio between two players as a variable. • The match prediction ability is based on consistent tour structure design and point attribution system.

2

Design of ATP ranking points

This section reviews the design of ATP ranking points. A hypothesis that the ratio of the ranking points is essential can be derived from their definitions.

2.1

ATP world tour and ATP rankings

ATP renovates overall tour structure in 2009. The ranking point system is also re-designed at the same time. Professional tennis players (member of ATP) participate in several tournaments hosted in various places all over the world. The series of tournaments are called as “ATP world tour”. Each tournament is classified into one of the tournament categories. Each tournament category has different size of draws, prize money, and ATP ranking points. Large and famous tournaments, e.g., four Grand Slams, have large ranking points. Table 1 lists the main tournament categories of ATP world tour in 2017. ATP ranking points are awarded also in tournaments such as ATP challenger tour and ITF1 futures series, not listed in Table 1. A player’s ATP ranking points is basically based on the total points he accrued in the following 18 tournaments in past 52 weeks. Top 30 players in the last year have to participate in four Grand Slams and 8 Masters tournaments. The detail of exceptions are not mentioned in this paper. Please refer [ATP World Tour(2016)]. 1 International

Tennis Federation.

2

Table 1: Tournament categories Category name

Number of tournaments

Gland Slam ATP World Tour Masters 1000 ATP World Tour 500 ATP World Tour 250 ATP Finals

4 9

ATP ranking points for winner 2000 1000

13

2.2

Draws (tournaments)

Governing body

128(4) 96(2), 56(6), 48(1)

ITF ATP

500

48(2), 32(11)

ATP

40

250

ATP

1

1100 – 1500

48(1), 32(2), 28(37) 8 (1)

ATP

Design of ATP ranking points and tournament structure

Table 1 shows that the ratio of winner’s ranking points between adjacent tournament categories are equal, i.e., the winner’s ranking points are 250, 500, 1000, and 2000, doubled with climbing up one category. Table 2 shows the detailed ranking point for each result. One win in each tournament shall increase ranking points in double or 5/3 times. Table 2: Ranking points in each tournament categories [ATP World Tour(2016)] Category name Gland Slam ATP World Tour Masters 1000 ATP World Tour 500 ATP World Tour 250

W

F

SF

QF

R16

R32

R64

R128

Q

2000 1000

1200 600

720 360

360 180

180 90

90 45

45 10(25)

10 (10)

25 16

500

300

180

90

45

(20)

20

250

150

90

45

20

(5)

12

Top players are seeded so that these players do not match in earlier stages. Explain 32 draws tournament with 8 seeded players[ATP(2017)]D • Make simple tournament bracket by 32 players. Each place are numbered as 1, 2, · · · , 32 from the top. 3

• 8 seeded players are determined based on the ATP ranking. 1st and 2nd seeded players are placed in numbers 1 and 32. • 3rd to 8th seeded players are grouped as follows: – 3rd and 4th seed: 9, or 24. – 5th to 8th see: 8, 16, 17, or 25. Placements in each group are determined by ballots. As a result, 1st and 2nd, 1st to 4th, and 1st to 8th seeded players are not matched before Final (F), Semifinals (SF), and Quarter-finals (QF). If the draws are larger than 32, e.g., 64 or 128, the placements of the seeded players are determined in the similar manner.

2.3

Hypothesis on relation between raking point and winning probability

The following hypothesis can be stated based on before mentioned consistent design of the ranking point, mandatory tournaments for top players, and seed players’ placement design for each tournament. HypothesisF • The logarithm of the ranking point shows how high the player can climb up in tournament bracket because the attributed ranking point grows almost the same rate with one victory. • At the same time, the gap of the ability of two players is a function of the ratio of their ranking points. – The predicted winning probability converges to 1 when the ratio is large. The value is expected to be 0.5 if the ratio is 1. Therefore, the predicted winning probability can be modeled by the logistic function p(x) =

xα , α > 0. 1 + xα

(1)

• Top 30 players in the last year have to participate in four Grand Slams and 8 Masters tournaments. They have to choice at least 6 tournaments from 500 or 250 series because the ranking point is the sum of the points earned in 18 tournaments. – The grater part of the ranking of the top players are determined by the result of the Grans Slams and Masters tournaments because all top players plays in these tournaments. – It is not rational to participate 6 (or more) 500 series tournaments because the other top players also will participate in these tournaments. 4

– If every top 30 players participate in 3 500 series tournaments, each 500 series tournament has 7 players out of top 30 players in average because 13 500 series tournaments are held in one year. Therefore, QF in 500 series tournament are equivalent to R32 in Grand Slams. Their ranking points are equal to 90. – If every top 30 players participate in 3 250 series tournaments, each 250 series tournament has 2.5 players out of top 30 players in average because 40 250 series tournaments are held in one year. Therefore, SF in 250 series tournament are equivalent to R32 in Grand Slams. Their ranking points are equal to 90. Based on these insights, (absolute) ranking and expected ranking points in each tournament category are listed in Table 3. Table 3: Ranking and expected ranking points Ranking Gland Slam (4) ATP World Tour Masters 1000 (8) ATP World Tour 500 (3) ATP World Tour 250 (3) Expected ranking points

2000 1000

1200 600

720 360 500

360 180 300 250

16 180 90 180 150 2430

32 90 45 90 90 1260

64 45 25 45 45 650

An ideal player at ranking 32 will gain 90×4+45×8+90×3+90×3 = 1260 ranking points. The other ideal players at rankings 16 and 64 will gain 2430 and 650, based on the similar discussions. The 2430 shall give an estimate of the upper bound for the ideal player because the player at rank 16 cannot always win against the player at rank 32. On the other hand, the 650 shall give an estimate of the lower bound for the ideal player because lower ranked players can obtain ranking points in the lower categories such as ATP challenger tours. This hypothesis will be verified below.

3

Analysis result

This section shows analysis result based on the ranking data disclose by ATP, and the database maintained by [Sackmann(2017)].

3.1

Number of participating tournaments and ranking points

Top 300 players in the official ATP ranking [ATP Rankings(2017)] are collected. Figures 1 and 2 show the number of tournaments and the ranking points divided into the tournament categories. Figures 3 and 4 show top 64 players. 5

Finals

GS

1000

500

Other

UC

Number of tournaments played

45 40 35 30 25 20 15 10 5 0 0

50

100

150

200

250

300

Ranking * GS: Grand Slams, UC: Uncountable Figure 1: Number of tournaments played in previous 52 weeks

Table 4 shows the number of participating tournaments in 500 or 250 series for top ranked players. As expected in the previous section, the top 30 players participates 3 tournaments for both 500 and 250 series in average. The higher ranked players participate more 500 tournaments, on the other hand, lower ranked players participate more 250 series. Figure 5 shows the ranking points at rank 16, 32, and 64 from 2010. Table 5 lists some basic statistics. This result shows that the hypothesis, • ranking point 2430 gives an upper bound for rank 16, • ranking point 1260 gives an estimate for rank 32, and • ranking point 650 gives an lower bound for rank 64, can not be rejected, and it has no huge mistakes. In particular, 1260 points for rank 32 is an useful estimate.

6

Finals

GS

1000

500

Other

UC

14000

Ranking points

12000 10000 8000 6000 4000 2000 0 0

50

100

150

200

250

300

Ranking * GS: Grand Slams, UC: Uncountable Figure 2: Ranking points

Figure 6 shows the ratio to the ranking point at rank 32. Table 6 shows some statistics. The ranking point ratios at rank 16 and 64 to rank 32 are nearly 1.6 and 0.6, respectively.

3.2

Winning probability and ranking point ratio

The previous section shows that the ratio between ranking points are essential value. Figure 7 shows the match result from 2009 to 2015. It includes about 20000 matches in ATP world tour, Davis Cup, and Olympic Games. The horizontal axis is the ratio between players, πi,j = ri /rj , (ri , rj are the ranking point of the players i and j. The vertical axes of the op and the bottom figures are the results and the number of matches, respectively. Match results are excluded if at least one player has 0 ranking points. The ratio is from 0.1 to 10 in the major part of the matches. A model πi,j gives an approximation, however, the winning probability of pˆi,j = 1 + πi,j higher-ranked players is overestimated.

7

Finals

GS

1000

500

Other

UC

10

20

30

40

50

60

Number of tournaments played

45 40 35 30 25 20 15 10 5 0 0

Ranking * GS: Grand Slams, UC: Uncountable Figure 3: Number of tournaments played in previous 52 weeks (top 64 players)

Here, assume that a model pˆi,j =

α πi,j α , α > 0, 1 + πi,j

(2)

and find the parameter α that minimizes the difference between the match results w and the predicted winning probability pˆ, i.e., minimize E2 =

1 number of matches

X

(wi,j − pˆi,j )2 .

(3)

f or all matches

α = 0.8722 gives the minimum E 2 = 0.2052. The bold line in the figure illustrates (2). As a comparison, the simplest prediction such as “the player with higher ranking will win” gives E 2 = 0.3227. The horizontal axis is replaced with the predicted winning probability defined in (2) in Figure 8. The similar analysis are done for about 3400 match results in 2016 and 8

Finals

GS

1000

500

Other

UC

10

20

30

40

50

60

14000

Ranking points

12000 10000 8000 6000 4000 2000 0 0

Ranking * GS: Grand Slams, UC: Uncountable Figure 4: Ranking points (top 64 players)

2017. The result is α = 0.8667 and E 2 = 0.2065. This shows that (2) gives an consistent prediction model in recent years.

4

Conclusion

This paper shows that the following hypothesis can be draws from the definition of ATP ranking from 2009. • ATP ranking point, the official ranking point system of the professional tennis players, can predict winning probability with the model pˆi,j =

α πi,j α , α = 0.8722, 1 + πi,j

(4)

where πi,j = ri /rj is the ranking point ratio between players i and j. • This can be realized by consistent design of consistent overall tournament structure design, including the tournament categories, the winner’s ranking points, seeding design, and so on. 9

Table 4: Number of ATP 500 and 250 tournaments played by top players

Top 8 Top 16 Top 30 Top 64

ATP ATP ATP ATP ATP ATP ATP ATP

World World World World World World World World

Tour Tour Tour Tour Tour Tour Tour Tour

500 250 500 250 500 250 500 250

0 0 0 0 0 0 0 2 0

1 0 4 0 4 1 4 7 4

2 2 2 5 4 11 4 25 4

3 3 1 7 6 13 7 22 9

4 3 1 3 2 4 10 6 16

5 0 0 1 0 1 3 2 8

6 or more 0 0 0 0 0 2 0 23

Mean 3.125 1.875 3.000 2.375 2.767 3.333 2.453 5.703

Table 5: Ranking points at rank 16, 32, and 64

Expected 2017.03.20 Maximum Mean Minimum Std∗

16 2430 2425 2425 2009.8 1685 138.40

32 1260 1265 1395 1224.4 1102 61.04

64 650 773 816 753.5 665 26.38

∗ Std: standard deviation

The hypothesis is validated using over 23000 match results from 2009 to 2017. The match result shows that the prediction model can explain the match result in wide range of the ranking point ratio between players.

References [ATP(2017)] ATP. 2017 ATP Official Rulebook, chapter 7, page 106. 2017. [ATP Rankings(2017)] ATP Rankings. Emirates ATP rankings. 2017.03.20 edition. http://www.atpworldtour.com/en/rankings/singles?rankDate=2017-03-20, 2017. [ATP World Tour(2016)] ATP World Tour. Rankings FAQ. http://www.atpworldtour.com/en/rankings/rankings-faq, 2016. referred in 2016/6/14.

10

Rank 16

Rank 32

Rank 64

Expected

3000

ATP Ranking Points

2500 2000 1500 1000 500 0 2011

2012

2013

2014

2015

2016

2017

Date Figure 5: Ranking points at rank 16, 32, and 64

[Bradley and Terry(1952)] Ralph Allan Bradley and Milton E. Terry. Rank analysis of incomplete block designs: I. The method of paired comparisons. Biometrika, 39(3/4):324–345, 1952. ISSN 00063444. URL http://www.jstor.org/stable/2334029. [Brin and Page(1998)] Sergey Brin and Lawrence Page. The anatomy of a largescale hypertextual web search engine. Comput. Netw. ISDN Syst., 30(1– 7):107–117, April 1998. ISSN 0169-7552. doi: 10.1016/S0169-7552(98) 00110-X. URL http://dx.doi.org/10.1016/S0169-7552(98)00110-X. [Dahl(2012)] Geir Dahl. A matrix-based ranking method with application to tennis. Linear Algebra and its Applications, 437(1):26–36, 2012. ISSN 0024-3795. doi: http://dx.doi.org/10.1016/j.laa.2012.02.002. URL http://www.sciencedirect.com/science/article/pii/S0024379512001218. [Dingle et al.(2013)Dingle, Knottenbelt, and Spanias] Nicholas Dingle, William Knottenbelt, and Demetris Spanias. On the (page) ranking of professional tennis players. In Mirco Tribastone and Stephen Gilmore, editors, Computer Performance Engineering: 9th European Workshop, EPEW 2012, Munich, Germany, July 30, 2012, and 28th UK Workshop, 11

ATP Ranking Points Ratio to Rank 32

Rank 16

Rank 32

Rank 64

Rank 16 x Rank 64

1.8 1.6 1.4 1.2 1 0.8

0.6 2011

2012

2013

2014

2015

2016

2017

Date Figure 6: Ranking points ratio to rank 32

UKPEW 2012, Edinburgh, UK, July 2, 2012, Revised Selected Papers, pages 237–247. Springer Berlin Heidelberg, 2013. ISBN 978-3-642-36781-6. [Klaassen and Magnus(2003)] Franc J.G.M. Klaassen and Jan R. Magnus. Forecasting the winner of a tennis match. European Journal of Operational Research, 148(2):257–267, 2003. ISSN 03772217. doi: http://dx.doi.org/10.1016/S0377-2217(02)00682-3. URL http://www.sciencedirect.com/science/article/pii/S0377221702006823. [Kovalchik(2016)] Stephanie Ann Kovalchik. Searching for the GOAT of tennis win prediction. Journal of Quantitative Analysis in Sports, 12(3):127–138, 2016. [Kuper et al.(2014)Kuper, Sierksma, and Spieksma] Gerard Kuper, Gerardus Sierksma, and Frits Spieksma. Using tennis rankings to predict performance in upcoming tournaments. Technical report, University of Groningen, SOM Research School, 2014. [McHale and Morton(2011)] Ian McHale and Alex Morton. A BradleyTerry type model for forecasting tennis match results. Interna12

Table 6: Ranking points ratio to rank 32

Expected 2017.03.20 Maximum Mean Minimum Std∗

16 1.9286 1.9246 1.9737 1.6445 1.3699 0.1288

32 1 1 1 1 1 0

64 0.5159 0.6135 0.7056 0.6166 0.5190 0.0330

∗ Std: standard deviation

tional Journal of Forecasting, 27(2):619–630, 2011. ISSN 01692070. doi: http://dx.doi.org/10.1016/j.ijforecast.2010.04.004. URL http://www.sciencedirect.com/science/article/pii/S0169207010001019. [Sackmann(2017)] Jeff Sackmann. https://github.com/JeffSackmann, 2017. URL https://github.com/JeffSackmann. retrieved in 2017/4/4.

13

Ranking point ratio to winning probability (ATP, 2009−2015) 1 Result π/(1+π) Fitted curve

Winning probability

0.8 0.6 0.4 0.2 0 −2

Number of matches

10

−1

0

10

10

1

10

2

10

4000

2000

0 −2 10

−1

10

0

10 Ranking point ratio π

1

10

Figure 7: Ranking point ratio to winning probability (ATP)

14

2

10

Predicted winning probability to winning probability (ATP, 2009−2015) 1

Winning probability

0.8 0.6 0.4

Predicted Result

0.2

Number of matches

0 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

0.1

0.2

0.3 0.4 0.5 0.6 0.7 Predicted winning probability

0.8

0.9

1

2000

1000

0

Figure 8: Predicted winning probability to result(ATP)

15

Finals

GS

10 50

20 100

1000

500

Other

UC

1.2

Ranking points ratio

1 0.8 0.6 0.4 0.2 0 0

30 150

Ranking

40200

50250

60 300