Analytics like Elo ratings open up alternate ways of understanding and comparing pro player performance
February 13, 2018 by Aaron Howard in Analysis with 0 comments
Many of us who are interested in disc golf would love to see the sport enter the mainstream, and the sport seems to be moving in the right direction thanks to the ever-growing PDGA membership base.
However popular disc golf is becoming, it is generally lacking in one area that professional sports in which the NBA, PGA Tour, and, particularly, MLB have excelled recently: analytics. There has been growth in this area recently with the statistics provided by UDisc Live, and PDGA ratings have been around for a while. But, generally speaking, disc golf has some catching up to do when it comes to using big data to push our understanding of the sport forward.
I hope to help with the genesis of Elo ratings for professional disc golfers. Elo ratings are a simple mathematical tool used for comparing players.1 They have become popular for comparing teams or individuals in sports. For example, the website FiveThirtyEight uses Elo ratings to rank and make predictions regarding NBA and NFL games.
Elo ratings are very popular because they are easy to calculate and easy to understand. Basically, each player’s rating starts with the same baseline value, such as 1500, which is then modified according to how the player scores as compared to all the other players for a given round. If a player scores well, their rating goes up, and vice versa. I provide more of the dirty details in the footnotes.2 But for now, let’s get right to the results.
I have calculated Elo ratings for the 2017 MPO and FPO seasons. These ratings include the 595 MPO players and 104 FPO players that competed in PDGA Majors and NTs (35 total rounds). The figures show how the ratings of all 595 and 104 players, respectively, changed over the 35 rounds. They are pretty, but you cannot really learn anything from them.
For more clarity, I also generated tables of the top 25 rated players. I ranked them based on the harmonic mean of their average, maximum, and season end ratings, but the tables are also sortable by all four measures. Each of these ratings has value and tell you something worthwhile. But why focus on the harmonic mean? Because it is more sensitive to lower values and, therefore, penalizes players for not being consistent (mean), good (maximum), and/or a strong finisher (season end).
You probably recognize some of the players on these tables. At the top of the MPO ranking is, of course, Ricky Wysocki, who had the consensus “best” season. He had both the highest mean and end of season rating. Close behind is Paul McBeth, who had the highest maximum rating of the season after his transcendent comeback in the final round of the European Open (sorry Gregg Barsby!). Unfortunately, his struggles early on at the USDGC hurt his mean and end of season ratings.
|Player||Mean||Maximum||End Of Season||Harmonic Mean|
At the top of the FPO ranking is Catrina Allen, which came as a bit of a surprise to me. Paige Pierce had the highest maximum and mean rating, but Catrina’s hot play at the Pittsburgh Flying Disc Open and the Hall of Fame Classic propelled her season end rating and harmonic mean above Pierce’s. For both MPO and FPO, ratings fall off a little after the top two players.
|Player||Mean||Maximum||End Of Season||Harmonic Mean|
|Ragna Bygde Lewis||1503.4||1506.7||1506.7||1505.6|
|Vanessa Van Dyken||1500.2||1503.7||1502.8||1502.3|
If you compare these ratings to those given by the PDGA, you will see a lot of consistencies. This makes sense because there are some conceptual similarities between PDGA player ratings and Elo ratings. For example, both use standardized round scores in their calculations. PDGA ratings use Scratch Scoring Average (SSA) and Elo ratings use the scores and ratings of other players in the round (see details below).
However, generally speaking, Elo ratings are much easier to calculate and one can quantify ratings for any tournament for which the scores exist, whether or not SSAs are available. This means we can generate ratings for players all the way back to the 1984 Pro Worlds (the earliest tournament data available on the PDGA website). As I generate these 20+ years of ratings I plan to improve upon my estimation of what is called regression to the mean,3 which controls for random fluctuations in performance that may be the result of many factors, the most common of which is small sample size (fewer rounds played).
Moving forward, I think these 2017 ratings provide a nice starting point for predicting future performance and generate some interesting questions. For example, can Nate Sexton translate his strong finish (his max ratings were his season end ratings) in 2017 and make moves on the dominant players? How quickly can any up and comers, like Kevin Jones, James Conrad, and Lisa Fajkus shoot up the ranks? We’ll have to wait and see.
The development of Elo ratings is one means of analytics that can enhance our understanding of the sport and provide an ample foundation upon which to explore it further. As soon as the 2018 season starts later this month, I will continue calculating Elo ratings for all participating players and expand the ratings to include all Disc Golf Pro Tour tournaments.
The method behind these ratings was first developed by Arpad Elo, a physics professor, who wanted a quantitative way to compare chess players. ↩
Methods: The Elo rating equation is: Elo Rating=PR+K*(S-(2*ES/N), where PR = previous rating, K = K-factor, S = round score, ES = expected score (based on other players competing in the same round), and N = number of players. K is a parameter that controls the volatility in ratings. Bigger K values mean more volatility. The K-factor I used was 20, which is a value that works well in a variety of sports. The 2*ES/N portion is modified from the classic Elo rating equation to deal with the fact that disc golf is not a one-on-one sport like chess (see: Building a rating system and Building a modified Elo rating system). For the first round of competition, when there was no PR, I used a baseline value of 1500. The baseline value can be anything you want, chess uses 1000, and it doesn’t really change your interpretation. I chose 1500 because it is commonly used for other sports. I extracted all data from the PDGA website. ↩
I did include an estimate of regression to the mean when calculating the 2017 ratings, but my estimate will be more accurate when data from more years are included. ↩