Applying machine learning to event data in soccer
Author(s)
Kerr, Matthew G. S. (Matthew George Soeryadjaya)
DownloadFull printable version (1.309Mb)
Alternative title
Application of machine learning technique to discover useful knowledge from event data in soccer
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
John V. Guttag.
Terms of use
Metadata
Show full item recordAbstract
Soccer is the world's most popular sport but published research in soccer analytics has yet to attain the same level of sophistication as analytics in other professional sports. We hope to discover new knowledge from a soccer ball-event dataset by applying different machine learning techniques. In this thesis we present three experiments that address three interesting questions in soccer that involve game prediction and team style. We approach each question by constructing features using the ball-event data, where an event is a pass, shot, etc., and applying machine learning algorithms. In the first experiment, we construct three models that use different features to predict which team won a given game, without any knowledge of goals. We achieve a top accuracy rate of 0.84 using an L2-regularized logistic regression classifier. We also investigate the feature weights to learn relationships between game events and a team's chances of success. In the second experiment we try several classifiers to predict which team produced the sequence of ball-events that occurred during a game. Despite the relatively small number of events per game, we achieved an accuracy rate of 0.345 for a 20-team classification task when using a RBF SVM. By learning which sequences are characteristic of teams we are potentially able to discover if successful teams have a common style. We also learn the efficacy of transforming ball-events into predefined symbols. Finally, in the third experiment, we predict which team attempted a given set of passes. We first construct 2D histograms of the locations of the origins of the passes. We then use the histograms as features in a 20-team classification task and discover that teams have characteristic passing styles, by achieving an accuracy rate of 0.735 using a learned K-NN classifier. The results demonstrate that approaching soccer analytics with a machine learning framework is effective. In addition to achieving good classification performance, we are able to discover useful, potentially actionable, knowledge by investigating the models and features that we construct.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Title as it appears in MIT Commencement Exercises program, June 5, 2015: Application of machine learning technique to discover useful knowledge from event data in soccer. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 69-70).
Date issued
2015Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.