Predictions in Soccer: Getting things Right more often than Wrong

A new book, Soccer Analytics: An Introduction using R, in the Chapman & Hall data science series, shows how it is possible to make predictions in soccer using the statistical programming language R. The book shows how match outcomes can be predicted with reasonable accuracy, using statistical and machine learning techniques, such as Poisson regression and conditional inference trees. The prediction of league outcomes is also discussed, and techniques presented for accurately predicting end-of-season points total.

Predicting match outcomes

Although soccer is governed by the laws of Association Football and physics, it is nonetheless affected by countless chance events, which can influence the outcome of matches. This, together with the fact that soccer is a low scoring sport in which about a quarter of matches are drawn, means that accurately predicting the outcome of matches is a challenging task.

Manchester United verses Cardiff

We all know matches where the best team lost due to a lucky goal scored against the run of play. Dull matches ending in a draw are also not uncommon. All of which makes the prediction of winners in soccer difficult.  Even the bookmakers struggle to predict match winners, with the bookmakers’ favourite generally only winning about 55% of the time. Consider for example, the match between Manchester United and Cardiff City at Old Trafford which took place in May 2019, when Cardiff had already been relegated from the Premier League. The bookmakers made Manchester United the overwhelming favourite and predicted that Cardiff had little chance of winning, with Bet365 offering odds of 9:1 for an away win. Yet, miraculously, Cardiff won the match 0-2, despite Manchester United having 74% of the possession and ten shots on target, compared with Cardiff’s four. From which we can only conclude that unexpected events occur, which make predicting the outcome of soccer matches difficult, especially when teams are evenly matched.      

However, if we take a probabilistic approach to match prediction, then things become easier. If we accept that we will not predict every match outcome with 100% accuracy, and instead focus on getting things right more often than we get things wrong, then we can start to make realistic predictions. Remember, because of uncertainty and chance it is not possible to predict soccer match outcomes with 100% accuracy. So, how accurate can we be? Well, if we simply selected the ‘home win’, ‘draw’, and ‘away win’ outcome options at random, we would expect to achieve 33.3% overall prediction accuracy. So, anything better than 33.3%, would suggest that our prediction method is working to some extent.

In Soccer Analytics: An Introduction using R, the reader is introduced to techniques for match prediction, which can achieve prediction accuracies of up to 60%, which is well above the 33.3% accuracy achieved by random selection. In the book several prediction methods are introduced, including Poisson regression, random forests, conditional inference trees, and Elo ranking. These are discussed in detail, and the reader is shown, using easy-to-follow examples, how these techniques can be applied to predict match outcomes.

Predicting league outcomes

Predicting end-of-season league outcomes is actually easier than predicting individual match outcomes in soccer. This is because every league in football behaves strictly according to a set of mathematical laws which defines its dynamics [1]. This means that the competing teams undergo a kind of random walk as they progress through the various rounds of competition, with points earned and banked on a continual basis. This means that when it comes to predicting future league performance, we have a lot of relevant historical information that can be utilised to make predictions more accurate.

In Soccer Analytics: An Introduction using R, the reader is introduced to techniques, such as the Pythagoras points system, for predicting league outcomes. Such techniques can be very accurate in predicting end-of-season league performance, especially as the competition progresses. As such, these prediction techniques should be of interest both to fans and those working in professional soccer, all of whom want to know where their team is likely to finish in the league at the end of the season.


  1. Beggs CB, Bond AJ, Emmonds S, Jones B. Hidden dynamics of soccer leagues: the predictive ‘power’ of partial standings. Plos one. 2019. 14 (12), e0225696

Copyright: Clive Beggs