Applying machine learning to event data in soccer

Kerr, Matthew G. S. (Matthew George Soeryadjaya)

Author(s)

Kerr, Matthew G. S. (Matthew George Soeryadjaya)

DownloadFull printable version (1.309Mb)

Alternative title

Application of machine learning technique to discover useful knowledge from event data in soccer

Other Contributors

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.

Advisor

John V. Guttag.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

Soccer is the world's most popular sport but published research in soccer analytics has yet to attain the same level of sophistication as analytics in other professional sports. We hope to discover new knowledge from a soccer ball-event dataset by applying different machine learning techniques. In this thesis we present three experiments that address three interesting questions in soccer that involve game prediction and team style. We approach each question by constructing features using the ball-event data, where an event is a pass, shot, etc., and applying machine learning algorithms. In the first experiment, we construct three models that use different features to predict which team won a given game, without any knowledge of goals. We achieve a top accuracy rate of 0.84 using an L2-regularized logistic regression classifier. We also investigate the feature weights to learn relationships between game events and a team's chances of success. In the second experiment we try several classifiers to predict which team produced the sequence of ball-events that occurred during a game. Despite the relatively small number of events per game, we achieved an accuracy rate of 0.345 for a 20-team classification task when using a RBF SVM. By learning which sequences are characteristic of teams we are potentially able to discover if successful teams have a common style. We also learn the efficacy of transforming ball-events into predefined symbols. Finally, in the third experiment, we predict which team attempted a given set of passes. We first construct 2D histograms of the locations of the origins of the passes. We then use the histograms as features in a 20-team classification task and discover that teams have characteristic passing styles, by achieving an accuracy rate of 0.735 using a learned K-NN classifier. The results demonstrate that approaching soccer analytics with a machine learning framework is effective. In addition to achieving good classification performance, we are able to discover useful, potentially actionable, knowledge by investigating the models and features that we construct.

Description

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.

Title as it appears in MIT Commencement Exercises program, June 5, 2015: Application of machine learning technique to discover useful knowledge from event data in soccer. Cataloged from student-submitted PDF version of thesis.

Includes bibliographical references (pages 69-70).

Date issued

2015

URI

http://hdl.handle.net/1721.1/100607

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Graduate Theses