The 2018 PyeongChang Olympics have concluded and the medal counts are in. Norway led the ranks, with 39 mdedals and 14 gold; Germany was second, with 31 medals and 14 gold; and Canada third, with 29 medals and 11 gold.
I’m not much of a sports junkie, but I do tend to get pulled into the Olympics. Seeing these medal tallies made me wonder, to what extent could we have predicted these results?
Well, it didn’t take much internet searching to uncover that prior to the 2018 Olympics, a team of researchers at Sciencing set out to predict the outcomes of the 2018 Winter Games. Sumanik Singh, a data scientist, Jacob Lauing, an editor, and Jesse Harold a user experience developer predicted the outcomes for each event by creating models from past data of athletes’ performance.
How did they make their predictions?
Prior to the 2018 Olympics, the research team gathered data from seven governing bodies encompassing the Winter Olympics’ various sports (for example, the International Ski Federation for ski and snowboard events, and the International Skating Union for skating events). The research team then quantified athletes’ performance in a few different ways based on the type of data available. If the available data were based on official points, the athletes were ranked according to their scores. If the available data were time-based or score based, times or scores were normalized relative to the highest score using a max-min normalization, and athletes were ranked accordingly. Finally, if the available data were based on standings, the researchers applied a rank-score function, an exponentially decaying function with a max value of 100 and a min value of 1, and then sorted and ranked the athletes.
The researchers looked at points, times, scores, or rankings from three different datasets: the 2017-18 World cup season (weighted at 0.5), the 2017-18 World cup season (weighted at 1.0) and the last 3-5 events for the sport (weighted at 1.0). The latter two datasets were given higher weights as they were thought to better reflect athletes’ performance leading into the 2018 Olympics.
Using data from these events (as available), the researchers predicted the top three athletes for each sport.
How did the data scientists do in predicting the 2018 Olympics?
Pretty darn well!
Here are their predictions for the top 5 countries:
And here are the official results:
As a Canadian, I can’t help noting the results for Canada. Not only did the scientists predict our third place standing, they predicted Canada’s 29 medals right on. Nice work!
Not shown above, but the team also correctly predicted host country South Korea’s ranking (#7).
Notably, the predictions for individual sports were less accurate. For example, in figure skating, the researchers predicted that Japan would come out on top, and Canada would rank 5th across the various events. As it happened, Japan ranked 3rd with 1 gold and 1 silver, and Canada came out on top, with 2 gold and 2 bronze. On the flip side, Canada was predicted to dominate ice hockey, but did not fare so well.
The researchers also tracked the accuracy of their predictions for each specific event on a day-by-day basis. Eyeballing those results, it appears the majority of individual event medal winners were predicted to be within 5 ranks of their actual standing. Meanwhile, it appears that the proportion of proportion of medal winners that were predicted right on (e.g. Charlotte Kalla of Sweden winning gold in Women’s 15km Skiathlon) was similar to the proportion of medal winners who totally defied the predictions (e.g. Carlijn Achtereekte of the Netherlands winning gold in Women’s 3000m Speed Skating).
These finding illustrate a general principle of making predictions with data, namely that aggregate predictions are more accurate than individual predictions. While the researchers correctly predicted 4 out of the top 5 countries, there was greater variation in the accuracy of predictions for particular sports, and even greater variation in accuracy for particular events. What’s cool (at least to me!) is that as the data are aggregated, the variations in individual predictions wash out, leading to increasingly accurate predictions.
Great work on this Sumanik Singh, Jacob Lauing, and Jesse Harold. And kudoz to all the incredibly disciplined athletes, hopefully now getting a brief moment to kick back and relax!