Advances and Applications in Statistics

The Advances and Applications in Statistics is an internationally recognized journal indexed in the Emerging Sources Citation Index (ESCI). It provides a platform for original research papers and survey articles in all areas of statistics, both computational and experimental in nature.

Submit Article

THE EFFECTS OF DECOMPOSITION OF THE GOALS SCORED IN CLASSIFYING THE OUTCOMES OF FIVE ENGLISH PREMIER LEAGUE SEASONS USING MACHINE LEARNING MODELS

Authors

  • Tomilayo P. Iyiola
  • Hilary I. Okagbue
  • Adedayo F. Adedotun
  • Toluwalase J. Akingbade

Keywords:

algorithm, classification, cross-validation, English Premier League, feature selection, machine learning, statistics

DOI:

https://doi.org/10.17654/0972361723026

Abstract

The English Premier League (EPL) is one of the best football championships in the world and thus, data generated from it is highly sought after by users of football data. One of the uses of the data is in the prediction of outcome of the league matches. This paper applies four machine learning (ML) models in classifying the outcome (home win, draw, and away win) of five consecutive seasons of EPL using only six independent variables. Information Gain Ratio (IGR) and ReliefF were the feature selection algorithms that reduced the independent variables from 16 to 6. Spearman rank correlation gave a high significant positive correlation between the ranks of the 2 feature selection algorithms. The Kruskal-Wallis H test indicated that there is a significant difference in the dependent variable between the different Seasons (Chi-square = 15.36, Degrees of freedom = 4, P = 0.004). Adaptive boosting (AB), gradient boosting (GB), logistic regression (LR) and random forests (RF) were used in the classification of the outcome using the six independent variables and the performance metrics showed a perfect classification in almost all the models. This paper concluded that the knowledge of the number of goals scored by the home and away teams, and the number of Goals scored by home and away teams in the first half and second half are all that is needed to correctly classify the outcomes of the English Premier League (EPL). Secondly, the knowledge of the own goals and goals scored by penalty, and yellow and red cards conceded by the home or away teams is not necessarily needed in the determination or prediction of the outcomes of the EPL.

Received: September 3, 2022; Revised: October 19, 2022; Accepted: December 5, 2022; Published: April 15, 2023

References

V. S. Arrul, P. Subramanian and R. Mafas, Predicting the football players’ market value using neural network model: a data-driven approach, ICDCECE. 2022. https://doi.org/10.1109/ICDCECE53908.2022.9792681.

A. Majumdar, R. Bakirov, D. Hodges, S. Scott and T. Rees, Machine learning for understanding and predicting injuries in football, Sports Med. Open 8(1) (2022), Art. 79.

S. Jain, E. Tiwari and P. Sardar, Soccer result prediction using deep learning and neural networks, Lect. Notes Data Engine. Commun. Technol. 57 (2021), 697-707.

R. Beal, T. J. Norman and S. D. Ramchurn, Artificial intelligence for team sports: A survey. Knowl. Engine. Review (2019), e28. https://doi.org/10.1017/S0269888919000225.

U. Haruna, J. Z. Maitama, M. Mohammed and R. G. Raj, Predicting the outcomes of football matches using machine learning approach, Commun. Comp. Info. Sci. 1547 (2022), 92-104.

S. K. Andrews, K. L. Narayanan, K. Balasubadra and M. S. Josephine, Analysis on sports data match result prediction using machine learning libraries, J. Physics: Conf. Series, 1964(4) (2021), Art. 042085.

E. Wheatcroft, Evaluating probabilistic forecasts of football matches: the case against the ranked probability score, J. Quant. Analy. Sports 17(4) (2021), 273-287.

L. S. Benz and M. J. Lopez, Estimating the change in soccer’s home advantage during the Covid-19 pandemic using bivariate Poisson regression, Adv. Stat. Analy. (2021), https://doi.org/10.1007/s10182-021-00413-9.

T. Liu, A. Garcia-de-Alcaraz, H. Wang, P. Hu and Q. Chen, Impact of scoring first on match outcome in the Chinese Football Super League, Front. Psych. 12 (2021), Art. 662708.

N. Razali, A. Mustapha, N. Mustapha and F. M. Clemente, A Bayesian approach for major European football league match prediction, Int. J. Nonlinear Anal. Appl. 12 (2021), 971-980.

A. C. Constantinou, N. E. Fenton and M. Neil, Profiting from an inefficient association football gambling market: Prediction, risk and uncertainty using Bayesian networks, Knowl. Based Syst. 50 (2013), 60-86.

L. Carloni, A. De Angelis, G. Sansonetti and A. Micarelli, A machine learning approach to football match result prediction, Commun. Comp. Info. Sci. 1420 (2021), 473-480.

I. B. da Costa, L. B. Marinho and C. E. S. Pires, Forecasting football results and exploiting betting markets: the case of “both teams to score”, Int. J. Forecasting 38(3) (2022), 895-909.

A. Cortez, A. Trigo and N. Loureiro, Football match line-up prediction based on physiological variables: a machine learning approach, Computers 11(3) (2022), Art. 40.

A. Ranjan, V. Kumar, D. Malhotra, R. Jain and P. Nagrath, Predicting the result of English premier league matches, Lect. Notes Netw. Syst. 203 (2021), 435-446.

R. Nestoruk and G. Slowinski, Prediction of football games results, CEUR Workshop Proc., 2951, 2021, pp. 156-165.

P. Xenopoulos and C. Silva, Graph neural networks to predict sports outcomes, Proc. IEEE Int. Conf. on Big Data, 2021, pp. 1757-1763.

C. Pipatchatchawal and S. Phimoltares, Predicting football match result using fusion-based classification models, 18th Int. Joint Conf. on Comp. Sci. and Software Engine. Cybernet. Human Beings 2021, Art. 9493837.

A. M. Sánchez Gálvez, R. Álvarez González, S. Sánchez Gálvez and M. Anzures García, Model to predict the result of a soccer match based on the number of goals scored by a single team, Computacion y Sistemas 26(1) (2022), 295-302.

J. Fahey-Gilmour, J. Heasman, B. Rogalski, B. Dawson and P. Peeling, Can elite Australian football player’s game performance be predicted? Int. J. Comp. Sci. Sport 20(1) (2021), 55-78.

Y. Bai and X. Zhang, Prediction model of football world cup championship based on machine learning and mobile algorithm, Mobile Information Systems 2021 (2021), Art. 1875060.

J. Yadav, Fuzzy C-mean clustering based soccer result analysis, Comm. Comp. Info. Sci. 1572 (2022), 3-14.

I. Behravan and S. M. Razavi, A novel machine learning method for estimating football players’ value in the transfer market, Soft Computing 25(3) (2021), 2499-2511.

Y. W. Syaifudin and P. Puspitaningayu, Predicting winner of football match using analytical hierarchy process: an analysis based on previous matches data, In Int. Conf. on Data Analy. Bus. Industry, 2021, pp. 47-52.

M. Kleina, M. N. D. Santos, T. N. D. Santos, M. A. M. Marques and W. D. A. Silva, Artificial intelligence techniques applied to predict teams position of the Brazilian football championship, J. Physical Educ. 32(1) (2022), e3254.

Y. Geurkink, J. Boone, S. Verstockt and J. G. Bourgois, Machine learning-based identification of the strongest predictive variables of winning and losing in Belgian professional soccer, Appl. Sci. 11(5) (2021), Art. 2378.

E. Filiz, Evaluation of match results of five successful football clubs with ensemble learning algorithms, Res. Quart. Exer. Sport 2022. https://doi.org/10.1080/02701367.2022.2053647.

M. Muszaidi, A. B. Mustapha, S. Ismail and N. Razali, Deep Learning Approach for football match classification of English Premier League (EPL) based on full-time results, Springer Proc. Physics 273 (2022), 339-350.

R. Bunker and T. Susnjak, The application of machine learning techniques for predicting match results in team sport: a review, J. Artificial Intel. Res. 73 (2022), 1285-1322.

H. I. Okagbue, C. A. Nzeadibe and J. A. Teixeira da Silva, Predicting access mode of multidisciplinary and library and information sciences journals using machine learning, COLLNET J. Scientometrics Info. Manag. 16(1) (2022), 117-124.

H. I. Okagbue, E. M. Akhmetshin and J. A. Teixeira da Silva, Distinct clusters of cite score and percentiles in top 1000 journals in Scopus, COLLNET J. Scientometrics Info. Manag. 15(1) (2021), 133-143.

H. I. Okagbue, P. E. Oguntunde, P. I. Adamu and O. A. Adejumo, Unique clusters of patterns of breast cancer survivorship, Health Technol. 12(2) (2022), 365-384.

H. I. Okagbue, P. I. Adamu, P. E. Oguntunde, E. C. M. Obasi and O. A. Odetunmibi, Machine learning prediction of breast cancer survival using age, sex, length of stay, mode of diagnosis and location of cancer, Health Technol. 11(4) (2021), 887-893.

H. I. Okagbue, P. E. Oguntunde, E. C. M. Obasi, P. I. Adamu and A. A. Opanuga, Diagnosing malaria from some symptoms: a machine learning approach and public health implications, Health Technol. 11 (2021), 23-37.

C. O. Iroham, S. Misra, O. C. Emebo and H. I. Okagbue, Predictive rental values model for low-income earners in slums: the case of Ijora, Nigeria. Int. J. Constr. Manag. (2021). https://doi.org/10.1080/15623599.2021.1975021.

Published

24-09-2025

Issue

Section

Articles

How to Cite

THE EFFECTS OF DECOMPOSITION OF THE GOALS SCORED IN CLASSIFYING THE OUTCOMES OF FIVE ENGLISH PREMIER LEAGUE SEASONS USING MACHINE LEARNING MODELS. (2025). Advances and Applications in Statistics , 87(1), 13-27. https://doi.org/10.17654/0972361723026

Similar Articles

1-10 of 115

You may also start an advanced similarity search for this article.

Most read articles by the same author(s)