Comparing the Predictive Power of Past Results Between Soccer Leagues

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2016 Comparing the Predictive Power of Past Results Between Soccer Leagues JO...

Author: Belinda Jones

31 downloads 0 Views 903KB Size

Report

Download PDF

Recommend Documents

Adult Indoor Soccer Leagues Rules & Regulations

COMPARING THE PREDICTIVE POWER OF NATIONAL CULTURAL DISTANCE MEASURES: HOFSTEDE VERSUS PROJECT GLOBE

Portland Timbers and Portland Thorns Youth Soccer Leagues

US SOCCER CHANGES SURVIVAL GUIDE FOR LEAGUES AND CLUBS

Past State Tournament Results

Unleashing the Predictive Power of the Integrated Master Schedule

Model Predictive Control of Power Electronics Converter

Summarizing the predictive power of a generalized linear model

The Value of Predictive Diagnostics to Fossil-Fuel Power Plants

Predictive power of confidence indicators for the Russian economy

Comparing Accuracy of Differential Equation. Results between Runge-Kutta Fehlberg Methods. and Adams-Moulton Methods

1992 US Youth Soccer National Championship. Results

Cisco Power Calculator -Power Results

Soccer results prediction using neural networks

Cisco Power Calculator -Power Results

THERMALISM BETWEEN PAST AND FUTURE

Comparing Extreme Programming and Waterfall Project Results

The need of competitive balance in European professional soccer: A lesson to be learned from the North American professional leagues

Captioning for the Big Leagues

Comparing the predictive powers of survival models using Harrell s C or Somers D

Roller Derby Coalition of Leagues

Comparing Multidimensional Poverty between Egypt and Tunisia

20,000 Leagues Under the Sea

Comparing early physiotherapy results between term and preterm at-risk infants

DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2016

Comparing the Predictive Power of Past Results Between Soccer Leagues JOHAN SANNEMO SIMON LINDHOLM

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION

Comparing the Predictive Power of Past Results Between Soccer Leagues En jämförelse av det prediktiva värdet av tidigare resultat mellan fotbollsligor

JOHAN SANNEMO

SIMON LINDHOLM

Degree Project in Computer Science, DD143X Supervisor: Arvind Kumar Examiner: Örjan Ekeberg

CSC, KTH, 2016-05-10

Abstract In this thesis, the performance of a number of models used to predict the result in soccer games has been investigated, using data on the previously played games in the league. Established models were implemented, and tested on a wide set of soccer leagues during several years. The performance of each model was measured using the likelihood ratio against a simple baseline distribution. The performance of these models was then analyzed to find systematic differences correlating with some properties of a soccer league, such as average number of goals in the league, and determine which models overall performed best. The results showed that such differences do exist, correlating with the average number of goals in a league as well as the variance in performance among teams in the league. Additionally, statistically significant differences in the performance of some models were established.

Referat

I denna rapport undersöktes prestandan hos ett antal modeller för att förutsäga resultat i fotbollsmatcher, tränade på resultaten i tidigare matcher i ligan. Etablerade modeller implementerades och testades sedan på flera årgångar av ett brett urval av fotbollsligor. Prestandan för varje modell mättes som en likelihoodratio mot en enkel basdistribution. Modellernas prestanda analyserades sedan för att hitta systematiska skillnader som korrelerar med någon viss egenskap hos en fotbollsliga, t.ex. genomsnittligt antal mål i ligan, samt att avgöra vilka modeller som presterade bäst. Resultaten visade att sådana skillnader finns, och att prestandan korrelerar med dels genomsnittligt antal mål i ligan, men även variansen hos prestandan för lagen i ligan. Dessutom hittades statistiskt signifikanta skillnader i prestanda mellan några av modellerna.

Contents 1 Introduction 1.1 Problem statement . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 2 2

2 Background 2.1 Soccer . . . . . . . . . . . . . . . . . . 2.1.1 Home advantage . . . . . . . . 2.2 Predicting results . . . . . . . . . . . . 2.3 Benchmarking predictions . . . . . . . 2.3.1 Likelihood ratios . . . . . . . . 2.4 Probability distributions . . . . . . . . 2.4.1 Univariate Poisson distribution 2.4.2 Negative binomial distribution . 2.4.3 Elo . . . . . . . . . . . . . . . .

3 3 3 4 4 4 4 5 6 7

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

3 Method 3.1 Models . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Poisson distribution . . . . . . . . . . . . . . 3.1.2 Censored Poisson distribution . . . . . . . . 3.1.3 Censored Poisson distribution with separate vantages . . . . . . . . . . . . . . . . . . . . 3.1.4 Censored bivariate Poisson distribution . . . 3.1.5 Negative Binomial distribution . . . . . . . 3.1.6 Elo . . . . . . . . . . . . . . . . . . . . . . . 3.1.7 Elo with proportional results . . . . . . . . . 3.1.8 Elo with goal differences . . . . . . . . . . . 3.2 Data set . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . 3.4 Validation . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . . . . . . . . . . . home ad. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 9 9 10 10 11 11 11 12 12 12 13 15

4 Results and discussion 4.1 Results . . . . . . . . 4.2 Limitations . . . . . 4.3 Future research . . . 4.4 Conclusion . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

17 17 22 22 22

Bibliography

23

A Model performances

25

Chapter 1 Introduction Ever since people started betting on sports, there has been a need to predict the results of some future sports event. Bookmakers take bets on many kinds of events – most commonly professional sport events – at some specified odds. For a bookmaker, setting good odds is the difference between making and losing money. However, to set good odds, the bookmaker needs to predict the outcome of the events as accurately as possible. Soccer is widely acknowledged to be world’s most pouplar sport[11], with a large betting industry. As a result, there has been a large number of studies on how to accurately predict soccer results, using various techniques. Many models have been developed for this purpose, and studies have been made on the mathematics underlying the game (such as the probability distribution of scores). Most of the studies with regards to the mathematics of soccer only use a very small sample of seasons in their analysis, however, often from a single league. This makes it hard to evaluate how generally applicable the findings of a study is, since the topic being studied may actually depend on some property of the league, like the underlying score probability distribution. In particular, there has been a lack of studies comparing the performance of a model across a large number of leagues to determine how good a data source (such as the scores of all previous games in a league) actually is, and if this performance may depend on the characteristics of the league.

1.1

Problem statement

The goal of this thesis is to analyze how good a predictor the scores of past games in soccer games are to predict future results. When predicting future 1

CHAPTER 1. INTRODUCTION

results, a number of prediction models from established literature will be evaluated on a large sample on different soccer leagues. This analysis will give insight into under what conditions past scores are powerful predictor, and what kind of models work best under different conditions. The thesis aims to investigate the following: • What is the predictive power of past scores in the game of soccer? • How does this predictive power differ among leagues? • What models and algorithms work best under different conditions?

1.2

Scope

In the thesis, a relatively small number of established models and variations thereof will be analyzed. No new models will be developed, other than applying variations of models suggested in literature even when the original publication did not do so. A single data source is considered - the goals scored in the previous matches within a certain year of a league.

1.3

Overview

Chapter 2 shortly describes the game of soccer, and the mathematics and principles behind the various models being used to predict the results. In chapter 3, the choices of data sets used and the models evaluated are motivated, and the process used in evaluating the performance of a a model is described.

2

Chapter 2 Background In this chapter, the models which the thesis builds upon and the probability distributions they are based on are described.

2.1

Soccer

In the game of soccer, two teams play against each other during 90 minutes. The goal of the game is to score goals against the opposing team. The game does not consist of as discrete steps as e.g. baseball. Instead, teams continuously compete of possession of the ball to possibly score a goal. The winner of a game is the team that scored the most goals. If the two teams score the same amount of goals, the game is tied. The matches have been analyzed in the context of a soccer league. A league consists of a set of teams, which (in most cases) each play every other team two times – one time at home, and one time away.

2.1.1

Home advantage

A well-established phenomenon in soccer is that of home advantage, that teams tend to score more goals when playing at home[3]. This is generally modelled using a constant factor A to quantify the advantage. This is defined as the ratio between goals scored at home and the goals scored away on average, meaning that if a team on average scores x goals away, it tends to score Ax goals at home. This constant is usually assumed to be constant within a league, but can differ greatly between different leagues. 3

CHAPTER 2. BACKGROUND

2.2

Predicting results

When predicting the result of a game, one can either determine the final score or which team wins the game (or if they tied). However, both of these are usually expressed in term of probabilities, meaning that predicting the probabilities of different final scores also gives a prediction of the ternary win/tie/loss result as well, by summing over all possible final scores. One of the main ways in predicting the result is to compute some kind of fitness parameters for each team, and then use these parameters in a statistical model for the final scores in a game[8].

2.3

Benchmarking predictions

In general, models are evaluated by comparing the distribution of matches with each final score against the actual distribution of such results, with the model trained on all the matches in the series. This has the downside of potentially overfitting the models, and only explains the results globally. It does not necessarily give insight into whether the scoring in the game itself follows a certain distribution, since trends which deviate from the distribution may cancel out when aggregated.

2.3.1

Likelihood ratios

Another way of evaluating models is to use them to predict probabilities of game outcomes, and compare likelihood ratios using those. The idea behind this is that, by Bayes’ theorem, and assuming a uniform prior on models: Pr[model is accurate | outcome] ∼ Pr[outcome | model is accurate] Thus, one can compare the relative likelihoods of two models by looking at ratios of Pr[outcome | model is accurate][5]. Given a list of predicted probabilities (pwin , ptie , plose ) for each game, this latter value can simply be taken as Q the product pactual-outcome over all games played.

2.4

Probability distributions

We consider three different probability distributions - the Poisson distribution, the Negative Binomial distribution, and the logistic distribution (in the form of the Elo rating system). 4

2.4. PROBABILITY DISTRIBUTIONS

2.4.1

Univariate Poisson distribution

The Poisson distribution with mean λ ≥ 0 has the probability mass function Pr[X = k] =

e−λ λk k!

for k ≥ 0. The Poisson distribution describes the probability of an event occurring k times within some unit of time, where events are independent, occur at a constant rate during the entire interval, and the probability of an event occurring in an interval is proportional to the length of the interval [4]. The rationale behind modelling soccer scores is that the game approximately fulfills these properties. However, some of them are not accurate. For example, the rate of goals is not constant during a match. Instead, the rate increases during the game, with a significant difference between the two halves of the match. Additionally, there is evidence that the number of goals depends on previously scored goals in the match, with the rate of goals increasing if a team has already scored many goals [6]. Separate attack and defense Since teams often have different offensive and defensive capabilities, it is reasonable to use separate fitness parameters to describe them. In [1], a model is developed where each team i is assigned two parameters αi and βi , which represent the offensive and defensive capabilities of the team. When team i meets team j, the parameters H = A · αi βj and A = αj βi are used. The final scores in the game are then assumed to be independent random variables with distributions P o(H) and P o(A). Here, A denotes the home advantage of the league expressed as a ratio between the expected number of goals scored by the home team and goals scored by the away team. Separate home advantages In general, home advantage is set to be a constant determined by all the matches in the league. However, in some leagues the home advantage significantly varies between teams. Therefore, this parameter can instead be considered for each team, so that home team i has the home advantage Ai . Censoring In general, the Poisson model when used for soccer tends to become overdispersed, meaning the data in the model has greater variability than expected. 5

CHAPTER 2. BACKGROUND

One of the causes is the phenomenon where the rate of scoring in a game increase when the team has scored many goals, resulting in a larger than expected number of results with high goal counts. This can be combated by censoring the data. If a team scores more than some number X goals, it is only considered to have scored X goals. This is called right-censoring. Using right-censoring has been shown to improve the performance of Poisson models when used for soccer [9]. Bivariate Poisson distribution The assumption that the number of goals scored by the two teams are independent is questionable, and tends to underestimate the number of draws and overestimate the number of wins with a large number of goals [7]. It has been suggested that the situation may be improved by introducing a correlation between the home and away scores in a match between teams i and j by considering a bivariate Poisson distribution X = U + W and Y = V + W . √ Letting u = αi βj , v = αj βi and w = ρ uv, the random variables U, V, W are independent Poisson variables with means u − w, v − w, w respectively. Here, w is the covariance between the random variables X and Y [1]. To compute the probability mass function, the following recursion can be used: Pr[X = Y = 0] = e−u−v−w x Pr[X = x, Y = y] = (u−w) Pr[X = x−1, Y = y]+w Pr[X = x−1, Y = y−1] y Pr[X = x, Y = y] = (v−w) Pr[X = x, Y = y−1]+w Pr[X = x−1, Y = y−1]

2.4.2

Negative binomial distribution

An alternative to using censoring for data where the Poisson fit is to use the Negative Binomial distribution [2]. The distribution has the probability mass function !

k+r−1 Pr[X = k] = · (1 − p)r pk k with two parameters r > 0 and p ∈ (0, 1). The parameters r and p can control both the mean and the variance of the distribution, unlike the Poisson distribution. However, the model does not take into account the different strengths of the teams, since it generally is not used for predictions. 6

2.4. PROBABILITY DISTRIBUTIONS

2.4.3

Elo

The Elo rating system is used to determine the relative strength of players in competitive sports. Given two teams with Elo ratings RA and RB , the RB −RA expected score of team A is computed as EA = 1/(1 + 10 400 ), and the expected score of team B is 1 − EA [10]. If the actual score of team A is SA , its rating is changed to RA + K(SA − EA ), for some positive value K. Similarly, the rating of team B is changed to RB + K(SB − EB ). In general, the score is taken to be 1, 21 , 0 for a win, tie and loss. Proportional results The Elo scoring can be modified by letting the score instead be the proportion of goals scored in the game. If the teams score GA and GB goals, their scores are GA GA + GB and GB GA + GB instead. When GA = GB = 0, both of these ratios are set to be 21 . The domain of the scores is the same, but a win with many goals is considered greater. Goal differences In general, the rankings within the Elo models converge too slowly since very few matches are played by a given team (around 30 per team on average). Therefore, the model can be improved by taking into account the results of the games by varying the value of K. Instead of letting K be a constant, it can be chosen as a function K : N → R of the goal difference [10]. A win, tie, loss is still counted as scores 1, 21 , 0 respectively, but now affect the ratings depending on how large the win was.

7

Chapter 3 Method A number of different soccer leagues were selected for analysis. For each league, the results of every game in the 6 most recent years were selected, where available. The leagues were chosen to have large differences in the factors suspected to affect the prediction performance, such as average goals, sample variance in goal making and sample variance of the relative strengths of the teams.

3.1

Models

The models were implemented in Python 3, using the NumPy [12] and TensorFlow [13] libraries. These models are referred to as “model 1” through “model 8” in the thesis.

3.1.1

Poisson distribution

The first model uses the Poisson distribution with the separate attack and defense fitness parameters. To determine the parameters α and β, a maximum likelihood approximation is used. The probability of all the outcomes happening, given α, β and A, is given by a product of probabilities of the form e−λ λk k! with λ = Bαi βj for some i, j and B ∈ {1, A}, given by the Poisson distribution. Since the results k are fixed, one can remove the constant factors 1/k!. Taking 9

CHAPTER 3. METHOD

logs and negating gives that the values α and β that maximize the likelihood of the outcome are the ones that minimize the loss function L=

X

Bαi βj − k log(Bαi βj )

where the sum is over all games, in both directions, and yields values i, j, k and B. This is not analytically solvable [1]. Instead, the parameters are determined using a gradient descent search. The gradient descent search uses the TensorFlow library. To avoid divergence issues during the descent, the slightly modified loss function L0 = L −

X

log(pf (αi )) −

X

log(pf (βi ))

was used, where  1/x + 4x − 4,

− log(pf (x)) = 

0,

if x < 0.5 otherwise.

This has the effect of making sure the αs and βs don’t get too small, and in particular never become negative, which can otherwise happen after a few iterations of gradient descent if a team has not yet scored a goal. pf can be interpreted as a prior on the likelihood of values of α and β. In practice, this should not affect the results since optimal values of α and β lie around 1 or 2.

3.1.2

Censored Poisson distribution

Model 2 uses the same model and fitness parameters as model 1, but rightcensor all the data. The data is censored at 3 goals, so that an observed result of 3 or more goals will be recoded as scoring 3 goals. This value was determined by testing the values and determining what censoring gave the best results.

3.1.3

Censored Poisson distribution with separate home advantages

This models introduces separate home advantage variables into model 2. What the actual home advantages Ai are can then be estimated using the same maximum likelihood estimates as for the model without the separate home advantages, meaning they too are selected by a gradient descent search. The loss 10

3.1. MODELS

function of the descent is the same as for the maximum likelihood estimation above: L=

X

Bαi βj − k log(Bαi βj )

except that now B is no longer either 1 or a constant A, depending on whether the home team’s or the away team’s score is counted, but rather either 1 or the variable Ai .

3.1.4

Censored bivariate Poisson distribution

The fourth model uses the bivariate Poisson distribution. The covariance between the two Poisson distributions was set to be 0.2, suggested in [1] to be a good value.

3.1.5

Negative Binomial distribution

The negative binomial distribution uses the same parameters r and p for each pair of teams. The parameters were determined using a simple Hill Climbing algorithm instead of a gradient descent search, since the parameter space had much lower dimension than the Poisson models.

3.1.6

Elo

The Elo model uses the normal Elo rating system. However, the rating system does not immediately yield the probabilities of a win, tie or loss. In the Elo rating system, the expected score is calculated as the probability of winning, plus half of the probability of getting a tie. In the models used in the thesis, the probability of a tie is computed as √ EA EB pT = 2 The win and lose probabilities are computed by proportionally normalizing the expected scores, so that that the win probability of the home team is pW = EA · (1 − pT ) and the probability of a loss is pL = EB · (1 − pT ) Additionally, each team has a separate home and away rating, to avoid the need of explicitly computing home advantages. This will instead be reflected by a lower away rating than home rating, where applicable. 11

CHAPTER 3. METHOD

3.1.7

Elo with proportional results

This model uses the proportional result method when computing what the actual score in a game was.

3.1.8

Elo with goal differences

In the goal difference model, the function K(D) = 26(D + 1) is used, as suggested by [10].

3.2

Data set

In total, games from 17 different leagues were analyzed: Sweden: Allsvenskan Sweden: Allsvenskan Women Sweden: Division 2, Södra Svealand Sweden: Division 2, Västra Götaland Germany: Bundesliga Germany: Bundesliga Women Germany: Junior Bundesliga South Germany: 3-liga USA: Major League Soccer Brazil: Series A Brazil: Series B South Africa: Premier League Egypt: Premier League Malaysia: Super League Norway: Tippeligaen Norway: Toppserien Women Spain: Superliga Women 12

3.3. EVALUATION

The countries were chosen to be geographically distributed, with leagues from various continents, and a large span of FIFA world rankings. 13 men’s leagues were chosen, and 4 women’s leagues. Within the men’s leagues some where the top leagues of their country and some weren’t. Additionally, the leagues have different average home advantage and number of teams per league.

Figure 3.1. Histogram of the average number of goals in the chosen odd years of the leagues. The sample ensured a wide range in this metric.

The results of games were scraped from the site betexplorer.com using a custom-built Python script. The best odds for each match were also collected, where available.

3.3

Evaluation

For each match in a certain year, the models were trained using all the data up to this match. Each model then gave a prediction as probabilities for the home team to win, lose or tie the game. The first half of these predictions were then discarded, since the interest lies in how good the models were when sufficiently trained. To evaluate the fitness of a model, the likelihood ratio of the actual results were computed. To make results comparable across leagues with different number of teams, we then took the nth root of the ratios, where n is the 13

CHAPTER 3. METHOD

Figure 3.2. Histogram of the sample standard deviation in home team rating in the chosen odd years of the leagues, as computed by the Elo model described in 3.1.8. This approximates the spread in skill within a league. These differences in skill had a reasonably wide spread in our data set.

number of games played. This corresponds to taking geometric averages of likelihoods rather than simple products. Likelihood ratios were taken against two sources of probabilities: a constant probability estimate, and probabilities derived from odds given by betting firms. In the first case, the probability estimates were set to (pwin , ptie , plose ), the average chances of wins, ties and losses respectively for the home team over the whole series. For the second, note that the odds o of an outcome with probability p should usually be set such that the expected winnings op − 1 of the betting firm is just above 0. Hence it can assumed that p should be proportional to 1/o, and normalizing these probabilities such that they sum to 1 gives a useful near-optimal estimate to compare with. One advantage of using likelihood ratios in this case was that it allowed estimating the performance of the Elo models. This is harder to do with other metods, since Elo does not describe an actual probability distribution on the number of goals, but only estimates probabilities of wins/ties/losses for each game. 14

3.4. VALIDATION

3.4

Validation

First, the performance of all the models on the odd years of the data were computed. These results were then analyzed to form hypothesis about the data. These hypothesis were then evaluated using the even years, as to avoid problems with overfitting our models and finding spurious correlations.

15

Chapter 4 Results and discussion 4.1

Results

Every model was evaluated on every league, producing a performance of the model on each league, defined as the geometrically averaged likelihood ratio against a naive constant probability model, as described in section 3.3. These results are available in appendix A. In general, the performance values of the models were all quite bad, with averages values around 1.05. However, the ratios against the odds were not that much worse, leading us to theorize that these low scores are simply due to inherent unpredictability of soccer. Except for the Negative Binomial model, which did not take into account differences in team strengths, all models had better performance than the naive model.

Model Performance

1 1.0596

2 1.0602

3 1.0526

4 1.0585

5 0.9912

6 1.0147

7 1.0139

8 1.0420

Figure 4.1. The average performance of the eight models. As expected, the negative binomial model performed the worst, the Elo models somewhere in the middle, and the four Poisson models performed the best.

The use of censoring improved the situation a bit, although by much less than expected. Generally, model 2 – which was the censored variant of model 1, our basic Poisson model – performed marginally better than model 1, except for a few outliers where it performs significantly worse. 17

CHAPTER 4. RESULTS AND DISCUSSION

Figure 4.2. The differences in performance of model 1 and model 2. Seen as a cumulative distribution function, this roughly appears to follow a normal distribution with σ = 0.0028 and µ = 0.0012. Using a t-test, the data shows that model 2 outperforms model 1 with p ≤ 0.05.

The performance of the censored Poisson model 2 was clearly correlated with two properties, average number of goals with an r2 value of 0.431 (figure 4.3) and the standard deviation of the home team Elo ratings in the league with an r2 value of 0.686 (figure 4.4).

Figure 4.3. Performance of model 2 as a function of the average number of goals, showing a clear linear correlation between the two.

18

4.1. RESULTS

Figure 4.4. Performance of model 2 as a function of the standard deviation of the home team Elo ratings, showing an even stronger linear correlation than when compared to the average number of goals.

Both of these correlations have reasonable explanations. In a league where variance in team strengths is high, it becomes easier to predict results, since teams more consistently beat each other when there is high variance. When the number of goals in a league is high, each match tend to be a better approximation of the true relative strength in a game. For example, a 6-6 score is a better evidence of two teams being equal in strength than a 0-0 score, which might just as well have been a 0-1 score.

However, these two properties are themselves correlated, with a linear correlation coefficient of 0.472 (figure 4.5). This correlation between the average number of goals and the home rating standard deviation likely stems from the fact that in leagues where teams have wildly different strengths, very strong teams will often meet very weak teams. In such a game, the number of goals is expected to be large, since the number of goals depends on the difference in team strengths.

When controlling for the correlation with rating standard deviation, the effect of goals on model performance still appears to be present, however, it is much weaker, and more data would be needed to make it statistically significant. 19

CHAPTER 4. RESULTS AND DISCUSSION

Figure 4.5. Home rating standard deviation against the number of goals. A clear correlation can be observed.

Figure 4.6. Performance of model 2 as a function of the average number of goals, when only looking at leagues with a home rating standard deviation in the range 65–120. This range was chosen from figure 4.4 as having a fair number of points, while showing no clear correlation with the performance of model 2. The graph shows a weak upwards trend. A similar but stronger effect was observed on the test data set.

20

4.1. RESULTS

The bivariate model 4 performed clearly worse than the univariate model 2 (figure 4.7), but with small, noisy differences. Assuming that the differences follow a normal distribution, this results in a 95% confidence interval of the average difference as −0.0076 ± 0.0034. This is interesting, since one of the motivations behind the bivariate model is that a match does actually consist of two teams. It would therefore not be implausible that their scores in a match have some positive dependence. If this was the case, this model would be expected to perform better in predictions as well, since it would more accurately represent the underlying distribution of the scoring. However, our results seem to indicate that this is not the case. It could be the case that the censoring of the scores lessens the effects that the bivariate model tries to combat. The paper [1] that suggested the bivariate model did not compare its performance with a censored version. However, the difference in performance between model 4 and model 2 did show a correlation with the proportion of ties in a league. A large number of ties might indicate a larger correlation of scores in each match, so this gives some credence to the belief that there are leagues that do exhibit positive correlation of scores in each match, where a bivariate model can help.

Figure 4.7. The difference in performance between model 4 and model 2, plotted against the proportion of ties in the league. A majority of points lie below y = 0, indicating that the univariate model 2 performs better, but this advantage becomes smaller with a larger proportion of ties.

21

CHAPTER 4. RESULTS AND DISCUSSION

4.2

Limitations

The amount of data was a bit too low to control for confounding variables in a good way, such as the strong correlation between average amount of goals and home Elo ratings. This made it hard to determine which of these actually affected the performance in a statistically significant way.

4.3

Future research

An interesting development would be to extend the Negative Binomial model in a similar way as the Poisson distribution, taking into account that different teams have different fitness. However, this would mean that a combined measure of variance would have to be constructed. As stated in the limitations, a larger amount of data would have been needed to draw certain conclusions. Thus a further study should increase the number of years and leagues being analyzed.

4.4

Conclusion

Using only the past results in a league to predict further results is only slightly better than using a fixed prediction corresponding to the proportion of wins, ties and losses among all the matches in the league. There exists differences in predictability among different leagues. Having a high average number of goals and a high variance in performance among the teams in the league both correlate with being easier to predict. With p ≤ 0.05, introducing censoring in the data improves a basic Poisson model marginally, and a bivariate Poisson model with a covariance of 0.2 was marginally worse than the univariate Poisson model. However, the difference in performance between the bivariate and the univariate Poisson models is correlated with the proportion of ties in a league, with the bivariate model improving relative to the univariate model when a league has many ties.

22

Bibliography [1] M.J. Maher. “Modelling association football scores”. In: Statistica Neerlandica 36.3 (1982), pp. 109–118. [2] R. Pollard. “69.9 Goal-Scoring and the Negative Binomial Distribution”. In: The Mathematical Gazette 69.447 (1985), pp. 45–47. [3] R. Pollard. “Home advantage in soccer: A retrospective analysis”. In: Journal of Sports Sciences 4.3 (1986), pp. 237–248. [4] J. B. Keller. “A Characterization of the Poisson Distribution and the Probability of Winning a Game”. In: The American Statistician 48.4 (1994), pp. 294–298. [5] G. Casella and R. Berger. Statistical Inference. Duxbury Resource Center, June 2001. [6] G. Dickson G.A. Abt and W.K. Mummery. “16 Goal Scoring Patterns Over the Course of a Match: An Analysis of the Australian National Soccer League”. In: Science and football IV (2002), p. 106. [7] I. McHale and P. Scarf. “Modelling soccer matches using bivariate discrete distributions with general dependence structure”. In: Statistica Neerlandica 61.4 (2007), pp. 432–445. [8] A. Heuer and O. Rubner. “Fitness, chance, and myths: an objective view on soccer results”. In: The European Physical Journal B 67.3 (2009), pp. 445–458. [9] D. van Gemert and J.C.M. van Ophem. “Modelling the Scores of Premier League Football Matches”. In: Economerics 18 (2010), p. 67. [10] L. M. Hvattum and H. Arntzen. “Using ELO ratings for match result prediction in association football”. In: International Journal of Forecasting 26.3 (2010), pp. 460–470. [11] R. Giulianotti. “Football”. In: The Wiley-Blackwell Encyclopedia of Globalization. John Wiley & Sons, Ltd, 2012. [12] Numpy. url: http://www.numpy.org/ (visited on 04/18/2016). 23

BIBLIOGRAPHY

[13] TensorFlow – an Open Source Software Library for Machine Intelligence. url: https://www.tensorflow.org/ (visited on 04/18/2016).

24

Appendix A Model performances BR 2010 2012 2014 BR 2010 2012 2014 EG 2010 2012 2014 DE 2010 2012 2014 DE 2010 2012 2014 DE 2010 2012 2014 DE 2010

Serie A

Serie B

Premier League

Bundesliga

3-liga

Jr. League South

Bundesliga Women

1 0.990 1.034 1.026 1 0.970 1.045 1.014 1 1.017 1.017 0.953 1 1.029 0.994 1.095 1 0.963 0.982 0.973 1 1.090 0.959 1.154 1 1.290

2 0.992 1.032 1.027 2 0.975 1.042 1.007 2 1.023 1.029 0.951 2 1.035 1.019 1.094 2 0.956 0.989 0.984 2 1.095 0.970 1.105 2 1.302

25

3 0.995 1.025 1.019 3 0.965 1.037 0.995 3 1.025 1.048 0.943 3 1.039 1.005 1.092 3 0.939 0.999 0.965 3 1.072 0.944 1.099 3 1.304

4 0.992 1.034 1.028 4 0.974 1.041 1.007 4 1.023 1.030 0.949 4 1.029 1.017 1.088 4 0.955 0.988 0.982 4 1.080 0.968 1.093 4 1.307

5 0.930 0.987 0.975 5 0.935 0.999 0.956 5 0.958 0.951 0.953 5 1.010 0.992 1.050 5 0.939 0.927 0.982 5 1.072 0.990 1.080 5 1.164

6 0.922 0.980 0.957 6 0.924 0.985 0.941 6 0.941 0.950 0.941 6 1.014 0.982 1.038 6 0.934 0.918 0.978 6 1.069 0.983 1.078 6 1.197

7 0.983 1.041 1.003 7 0.958 1.044 0.992 7 1.012 0.979 0.973 7 1.052 1.012 1.063 7 0.959 0.959 0.958 7 1.024 0.960 1.079 7 1.209

8 0.961 1.004 0.972 8 0.971 1.008 0.997 8 0.922 0.830 0.949 8 0.981 1.009 1.014 8 0.994 0.918 1.005 8 0.947 0.991 0.997 8 1.099

APPENDIX A. MODEL PERFORMANCES

2012 2014 MY - Super League 2010 2012 2014 NO - Toppserien Women 2010 2012 2014 NO - Tippeligaen 2010 2012 2014 ZA - Premier League 2010 2012 2014 ES - Superliga Women 2010 2012 SE - Allsvenskan 2010 2012 2014 SE - Div. 2, S. Svealand 2010 2012 2014 SE - Div. 2, V. Götaland 2010 2012 2014 SE - Allsvenskan Women 2010 2012 2014 US - MLS

1.060 1.264 1 1.107 1.108 0.943 1 1.246 1.378 1.157 1 1.062 1.061 1.038 1 0.991 1.017 1.021 1 1.351 1.150 1 0.984 0.992 0.988 1 0.956 1.104 1.013 1 1.059 1.002 1.068 1 1.106 1.049 1.206 1

1.076 1.296 2 1.099 1.103 0.945 2 1.197 1.310 1.141 2 1.050 1.066 1.039 2 0.995 1.016 1.027 2 1.308 1.125 2 0.995 1.004 0.996 2 0.961 1.110 1.066 2 1.061 1.004 1.063 2 1.122 1.076 1.192 2

26

1.037 1.307 3 1.074 1.106 0.923 3 1.175 1.317 1.125 3 1.061 1.068 1.040 3 0.997 1.010 1.015 3 1.311 1.122 3 0.984 1.012 1.007 3 0.956 1.106 1.045 3 1.042 0.991 1.053 3 1.106 1.048 1.184 3

1.080 1.296 4 1.107 1.102 0.942 4 1.189 1.300 1.144 4 1.047 1.071 1.037 4 0.997 1.014 1.025 4 1.304 1.122 4 0.993 1.000 0.995 4 0.957 1.109 1.067 4 1.056 1.012 1.066 4 1.128 1.069 1.170 4

1.108 1.145 5 1.064 1.052 0.911 5 1.116 1.198 1.037 5 0.972 0.984 0.994 5 0.944 0.999 0.997 5 1.205 1.060 5 0.967 0.923 0.953 5 0.953 1.033 1.047 5 1.024 0.977 1.048 5 1.111 1.044 1.119 5

1.142 1.180 6 1.064 1.053 0.911 6 1.144 1.235 1.072 6 0.963 0.978 0.984 6 0.937 0.983 0.985 6 1.221 1.050 6 0.950 0.918 0.949 6 0.935 1.045 1.045 6 1.022 0.969 1.055 6 1.109 1.047 1.156 6

1.107 1.237 7 1.092 1.075 0.957 7 1.125 1.264 1.104 7 1.045 1.031 1.015 7 0.962 1.025 1.010 7 1.253 1.127 7 0.987 0.984 0.994 7 0.996 1.068 1.053 7 0.994 1.018 1.058 7 1.120 1.052 1.121 7

0.998 1.096 8 1.009 1.023 0.994 8 1.035 1.105 0.959 8 0.978 0.968 0.979 8 0.927 0.978 0.985 8 1.076 0.987 8 0.984 0.962 0.945 8 0.952 1.026 0.999 8 0.992 0.995 1.003 8 1.033 1.011 1.084 8

2010 2012 2014

1.021 0.961 0.922

1.026 0.976 0.938

27

1.012 0.956 0.930

1.024 0.978 0.941

0.990 0.970 0.939

0.982 0.952 0.929

1.020 1.007 0.938

0.995 0.975 0.936

www.kth.se