The 2018 PyeongChang Olympics have concluded and the medal counts are in. Norway led the ranks, with 39 mdedals and 14 gold; Germany was second, with 31 medals and 14 gold; and Canada third, with 29 medals and 11 gold.
I’m not much of a sports junkie, but I do tend to get pulled into the Olympics. Seeing these medal tallies made me wonder, to what extent could we have predicted these results?
Well, it didn’t take much internet searching to uncover that prior to the 2018 Olympics, a team of researchers at Sciencing set out to predict the outcomes of the 2018 Winter Games. Sumanik Singh, a data scientist, Jacob Lauing, an editor, and Jesse Harold a user experience developer predicted the outcomes for each event by creating models from past data of athletes’ performance.
How did they make their predictions?
Prior to the 2018 Olympics, the research team gathered data from seven governing bodies encompassing the Winter Olympics’ various sports (for example, the International Ski Federation for ski and snowboard events, and the International Skating Union for skating events). The research team then quantified athletes’ performance in a few different ways based on the type of data available. If the available data were based on official points, the athletes were ranked according to their scores. If the available data were time-based or score based, times or scores were normalized relative to the highest score using a max-min normalization, and athletes were ranked accordingly. Finally, if the available data were based on standings, the researchers applied a rank-score function, an exponentially decaying function with a max value of 100 and a min value of 1, and then sorted and ranked the athletes.
The researchers looked at points, times, scores, or rankings from three different datasets: the 2017-18 World cup season (weighted at 0.5), the 2017-18 World cup season (weighted at 1.0) and the last 3-5 events for the sport (weighted at 1.0). The latter two datasets were given higher weights as they were thought to better reflect athletes’ performance leading into the 2018 Olympics.
Using data from these events (as available), the researchers predicted the top three athletes for each sport.
How did the data scientists do in predicting the 2018 Olympics?
Pretty darn well!
Here are their predictions for the top 5 countries:
And here are the official results:
As a Canadian, I can’t help noting the results for Canada. Not only did the scientists predict our third place standing, they predicted Canada’s 29 medals right on. Nice work!
Not shown above, but the team also correctly predicted host country South Korea’s ranking (#7).
Notably, the predictions for individual sports were less accurate. For example, in figure skating, the researchers predicted that Japan would come out on top, and Canada would rank 5th across the various events. As it happened, Japan ranked 3rd with 1 gold and 1 silver, and Canada came out on top, with 2 gold and 2 bronze. On the flip side, Canada was predicted to dominate ice hockey, but did not fare so well.
The researchers also tracked the accuracy of their predictions for each specific event on a day-by-day basis. Eyeballing those results, it appears the majority of individual event medal winners were predicted to be within 5 ranks of their actual standing. Meanwhile, it appears that the proportion of proportion of medal winners that were predicted right on (e.g. Charlotte Kalla of Sweden winning gold in Women’s 15km Skiathlon) was similar to the proportion of medal winners who totally defied the predictions (e.g. Carlijn Achtereekte of the Netherlands winning gold in Women’s 3000m Speed Skating).
These finding illustrate a general principle of making predictions with data, namely that aggregate predictions are more accurate than individual predictions. While the researchers correctly predicted 4 out of the top 5 countries, there was greater variation in the accuracy of predictions for particular sports, and even greater variation in accuracy for particular events. What’s cool (at least to me!) is that as the data are aggregated, the variations in individual predictions wash out, leading to increasingly accurate predictions.
Great work on this Sumanik Singh, Jacob Lauing, and Jesse Harold. And kudoz to all the incredibly disciplined athletes, hopefully now getting a brief moment to kick back and relax!
As noted in a recent issue of the Economist, “the world’s most valuable resource is no longer oil, but data.” As the digital revolution steadily transforms our economy, we are producing data in record volumes. By one estimate, more data has been produced in the last 2 years than in all of human history before that.
While the sheer volume of data is indisputable, what makes data so valuable? First, data allows companies to monitor their performance, and compare it against meaningful benchmarks, competitor performance, or the general marketplace. Second, data offers an unprecedented window into customers’ preferences, needs, and behaviours.
In a well-functioning data-driven organization, insights into company performance and consumer behaviour offer decision-makers the information they need to make smarter decisions, resulting in better return on investment (ROI). Indeed, a study by the Economist Intelligence Unit found that companies who describe themselves as data-driven were three times as likely to rate themselves ahead of their competitors in terms of financial performance.
However, just as crude oil is not very useful without refinement, raw data is of minimal utility. Data is only valuable to the extent that relevant insights can be extracted from it. And in spite of the gains that data-driven companies are realizing in the marketplace, many companies are struggling to effectively integrate data insights with decision-making. According to a 2016 study of 300 senior executives in 16 countries, published by the Association of International Certified Professional Accountants (CGMA), only 27% of C-level executives believe that their company makes “highly effective” use of data. Meanwhile, 32% of executives claimed that the tidal wave of data now available has actually made things worse!
Many decision-makers are (understandably) overwhelmed by the volume of information now available to them. In their book by the same title, seasoned business executives Christopher J Frank and Paul Magnone call this phenomenon “drinking from the fire hose.” How can decision-makers – already stretched thin with a thousand competing priorities – harness the value of their data to make better decisions about how to serve their customers?
As noted in the CGMA report (p. 13): “High-quality decision making has never been more important—or more difficult.”
Although we now have access to information like never before, this is not the first time our species has faced information overload. In fact, long before smartphone and computers were ever invented, we already faced a deluge of information! And even if we manage to set aside our digital tools today, we are still encountering more information than our brains can process.
Where is this information coming from? It is arriving from the world around us through our sense organs: our eyes, ears, nose, tongue, and skin. Our brain is constantly being blasted by signals from our sense organs.
There is way more information available in the world than we have cognitive resources to process, and even less that we have attentional resources to process consciously. Yet, somehow, we must make our way through the world through a series of decisions. How do we do this?
Our brains find a way to narrow down the available information, to focus on the most relevant pieces.
To demonstrate, take a look at the flashing image below. It is alternating between two images with one significant difference between them. Can you spot it?
Are you still looking? It often takes several seconds, sometimes even minutes, to spot the difference. Yet as soon as you have spotted it, the difference between the images seems totally obvious and difficult not to see! (If you’re stuck, the answer is provided at the bottom of this post.)
In Psychology, this phenomenon is known as change blindness. It is one of my personal favourite illustrations to demonstrate just how much information is outside of our awareness. Even when we feel that we are absorbing most visual information in the world, the amount of information that we are consciously processing is actually very small.
The brain has evolved to deal with large volumes of sensory inputs through selective filtering. This means we pay attention to a small sliver of information available to us, and ignore the rest. We filter out details that are unlikely to be important, in order to preserve our precious cognitive resources. In this way, we manage to not go completely insane by the amount of information arriving at us on a continual basis through our sense organs.
Data – the second information deluge
Although the phenomenon of information overload may not be new, we are now facing a second form of information overload. This time, rather than sensory information arriving through our eyes, ears, and other sensory organs, the information is coming at us in digital format. We have evolved over hundreds of millions of years to deal with sensory information, but only in the last one or two decades have we begun to face a true tidal wave of digital information.
Data may be the new oil – but how do we harness it? How can decision-makers maintain their sanity when confronting a tidal wave of digital information?
Investing in data infrastructure and human resources (such as analysts, data engineers, and data scientists) is part of the solution. However, ultimately, for data to be useful, it needs to feed into decision-making – which means anyone in a decision-making role needs to be able to access and make use of data insights. And this means decision-makers need to shift away from asking, “How can I use all this information?” to asking, “What information is important, and what can be ignored?” In other words, how can we put filters on the data to find what’s really important to us?
In their book, “Drinking from the Fire Hose: Making Smarter Decisions Without Drowning in Information,” Christopher Frank and Paul Magnone suggest that decision-makers start by asking themselves: “What is the essential business question?”
“Think of that question as a valve, or a filter, on the data hose. It puts you in control. It governs the rate, flow and direction of the data you collect and manipulate, and even helps determine how you’ll be able to deliver it.” (Ch. 1, par. 3)
The volume of data humans produce, already staggeringly high, is estimated to grow at 40% per year. As the digital revolution continues to unfold, every company becomes a digital company. Meanwhile, organizations are still figuring out exactly how to make use of the wealth of information suddenly available.
Storing data is not enough. The organizations that see the biggest advantages from data are the ones who know what information they need and how to hone in on it. By starting with the essential question, decision-makers can apply selective filtering to their data. This means they remain in the drivers seat, fuelling their decisions with powerful insights data provides.
(Answer to the change blindness demo: on one of the images, the airplane is missing a propeller under its wing).