Football Predictions Don’t Work!
(well, not when we code them in MatLab anyway…)
Alex, Philip, Kate, Bobby
Introduction
Aim: To produce, and (if possible) gauge the accuracy of, a variety of statistical models to predict forthcoming football fixtures. Models derived:
Baby model 2 x general linear models
The “Baby” Model
Just to get us off the ground… Consider only goals/match in all matches played by each team, and fit a Poisson Distribution to this data.
Q. Do home teams obtain an advantage from the home crowd?
Should we consider away games for our home team, and vice versa?
The “Baby” Model
A t-test of goals scored at home/away matches for each team shows virtually no significant difference.
We use a paired t-test using the MatLab TTEST function to compare the home and away game goal counts, testing for H0 = no significant difference in means between home and away goals.
H0 equals 0 for most teams in our fixture list except Portsmouth.
Hence we use all available goals/match data for each team.
The “Baby” Model
We plot data for each team on a histogram, and approximate X ~ Po(λ) using the data mean λ.
Sample data with mean λ
Simulated Data from X ~ Po(λ)
The “Baby” Model
We set up matrix of possible results from nilall to 9:9 (unlikely!).
Results generation for each forthcoming fixture using probabilities from Poisson simulated data.
Fixture
B’ham vs L’pool Bolton vs M’boro Everton vs Chelsea
P’mth vs A.Villa
P(Home win)
0.27 0.38 0.26
0.37
+ Very simple to code up - Neglects all other information available to us as irrelevant (clearly not true!).
Man City vs Man Untd
0.31
Okay, so this one did work…
Fixture
B’ham vs L’pool Bolton vs M’boro
Predicted Result
1 – 1 1 – 1
Everton vs Chelsea
P’mth vs A.Villa Man City vs Man Untd
1 – 1
0 – 1 1 - 1
Multiple Linear Regression
Used full data set to calculate values for betas (Home and Away) e.g. Consider Birmingham’s matches
Calculated lambda for every game and took average Used averaged value of lambda to generate Poisson Distribution Repeated for the away team using away data set
Multiple Linear Regression
Probability of any result = product of individual probabilities
Fixture
P(Home win)
B’ham vs L’pool
0.28
e.g.
P(Birmingham 0 : Liverpool 0 ) = P(Birmingham(0)) * P(Liverpool(0))
Bolton vs M’boro Everton vs Chelsea P’mth vs A.Villa
0.32 0.25
0.42 0.27
Assumed independent events
Man City vs Man Untd
And okay, this one did too…
Fixture
B’ham vs L’pool Bolton vs M’boro Everton vs Chelsea P’mth vs A.Villa Man City vs Man Untd
Predicted Result
1 – 1 1 – 1 1 – 2 1 – 1 1 - 2
2nd General Linear Model
Uses MatLab function GLMFIT, and more parameters than just goals/match! Uses different values of regression parameters β0,…, βp for each home and away team. + Fairly realistic, as teams are influenced to varying degrees by different explanatory variables
-
Reduces dataset of explanatory variables with which to estimate β0,…, βp and hence our confidence in the model’s accuracy.
2nd General Linear Model
e.g. Birmingham vs Liverpool We obtain β0,…, βp for B’ham using explanatory training data for all matches where B’ham plays at home.
EXCEPT for data from past B’ham vs Liverpool matches, which we want to use as our test data.
Obtaining β0,…, βp for each team should allow us to evaluate Poisson parameters for model. Problems:
We obtain Poisson parameters less than 1 (including some which are less than zero) This model will predict nil-nil draws for nearly all its matches. Something’s gone wrong… not sure what…