The best records in sports

May 22, 2019

Recently John Burn-Murdoch updated his interactive football viz and tweeted about how great Lionel Messi is.

Itā€™s fair to say that in recent memory, his scoring rate stands out as exceptional.

But over the years, in a variety of sports there have been exceptional performers. This begs the question, is Lionel Messiā€™s goal rate more impressive than Rafa Nadalā€™s clay court win %? What about Don Bradmanā€™s batting average?

Itā€™s basically impossible to objectively compare players within the same sport across different eras. Itā€™s even more ridiculous to compare different sports across eras. But itā€™s also fun.

With that goal in mind (fun!) I decided to get hold of some data from other sports and do some comparisons. This isnā€™t a comprehensive view on all sports or potential records, just a small number that I could come up with off the top of my head and fairly easily find some data on.

The full code for doing this is on my Github.

Tennis

Letā€™s start with tennis. I couldnā€™t find a website with the stats I wanted which was easy to webscrape, so I ended up manually copy and pasting a few pages from atptour.com into Excel. Iā€™m not entirely sure which players are included here, there are some good and some bad records so thereā€™s a spread of players but Iā€™m not sure whether itā€™s covering ā€œall professional tennis players in the open eraā€ or something more specific.

If I want to compare multiple different sorts of records over many sports, itā€™s going to be necessary to have a consistent scale. Goals per game in football doesnā€™t share a common scale with win rate. And even if we have two different win %ages it might be the case that itā€™s easier to win in tennis than in F1, so the shape of these data are different. To get around this, Iā€™ve decided to use z scores (or standardised scores) where Iā€™ve subtracted the mean and divided by the standard deviation. These numbers then represent how far top performers are from average performers accounting for how much variation there is in performance.

If the variables are normally distributed (look like a bell curve) then a value which is greater than 3 is the top 0.01% of performance. As weā€™ll see, a lot of these variables arenā€™t normally distributed so this analysis isnā€™t perfect, but I think itā€™s an acceptable attempt at what is probably an impossible problem.

For most of the records, Iā€™ve looked at rates to try and control for the fact that in a lot of sports the number of games played has increased over time. This means that itā€™s easier to stand out with a smaller number of games played, but there will always be compromises when converting down to a single number.

With that in mind, hereā€™s a visual looking at the grass, clay and overall win rates of tennis players:

An interesting stat (to me anyway) I uncovered while building this is that Andy Murray has a better win rate on grass courts than both Pete Sampras and Novak Djokovic. Djokovic ending up with the 6th best all time Clay win rate is also interesting given heā€™s had to deal with overlapping with Nadal.

Cricket

Iā€™ve already mentioned Don Bradmanā€™s test cricket batting average, so letā€™s move on to look at cricket. Here I found the data on Espncricinfo and filter to all players who have played in at least 10 test matches.

Whereas in tennis you can argue that Bjorn Borg is in the rough region of Nadalā€™s dominance on clay, no one is close to Bradman.

Formula 1

Moving on to Formula 1, currently weā€™re seeing Lewis Hamilton potentially on course to end his career with the a record number of wins, poles and championships. But if we focus on his per race stats, how does he stack up?

This data was filtered to drivers with at least one win, one pole or one fastest lap over their careers.

You can make a case for Hamilton being by far the most statistically successful driver in the modern era, but he falls short of the ridiculous statistics achieved by Fangio, Ascari and Clark. On the subject of Jim Clark, the piece Richard Hammond presented on him in most recent series of the Grand Tour is excellent and worth a watch for F1 fans.

Football

Finally, we can move back to football and see how Messi stands out on the same metrics. Iā€™ve used two different data sources for this, one is a copy of John Burn-Mudrochā€™s data set which sits behind his visualisation which covers 186 of the best goal scores in the top European leagues since 1990. The full definitions can be seen here.

I also scraped a more extensive set of data which covers the top 1000 all time goal scorers from each of the top five European leagues and major European club competitions. It turned out to be quite hard to get something in the middle of these two datasets as it would involve needing to grab information from a lot of different pages so Iā€™ve left it as is.

Itā€™s quite interesting to see the impressive stats of names from the 1940s-1960s who Iā€™d never come across before, even if they arenā€™t really that comparable to the modern game.

Here, Messi stands out in the modern era, but depsite the similar distributions for the bulk of players his achievements arenā€™t as extreme in the context of the history of football. And though Messi does stand relatively alone in the modern era, Cristiano Ronaldo clearly bridges the gap to the rest of the pack as another exceptional performer. This is a bit similar to Bjorn Borgā€™s clay court record in tennis making Nadal look less of an outlier than he otherwise would.

Putting it all together

Finally, letā€™s combine the most interesting figures from each sport into one visual and see how everyone stacks up:

Where do we end up?

  • The best F1 pole sitters are more extreme than the best records from the other sports weā€™ve seen here.
  • Don Bradmanā€™s test average is arguably the most ā€œpeerlessā€ record, most others have someone else a little bit behind bridgin the gap the rest (Borg, Ronaldo, Clark) but he stands head and shoulders above the rest.
  • Messiā€™s record is could harshly be judged as ā€œmiddle of the road greatnessā€ in the context of other sports.

Again, Iā€™m not saying these comparisons are accurate scientific judgement, but one potential take on how you might try and make impossible comparisons!

Iā€™d love to see other takes on this sort of comparison and read any comments either on this site or via twitter

comments powered by Disqus