MSAI students collaborate with Northwestern Football

Students are using artificial intelligence to attempt to predict game outcomes

MSAI students Vamsi Banda, Noah Caldwell-Gatsos and Eric YangMSAI students Vamsi Banda, Noah Caldwell-Gatsos and Eric Yang

By Vamsi Banda, Noah Caldwell-Gatsos and Eric Yang

Over the past quarter, we have been collaborating with the Northwestern Wildcats to develop a tool that can surface those aspects of in game performance that predict outcomes and use this to provide advice to the coaches. In this work, we have developed a model to analyze and predict the outcomes of football games based on attributes of games in play, “game-level attributes,” using data that scopes over the last four years and multiple teams. The focus, however, is to go beyond prediction and provide explanations. 

A major component of this work is to establish which attributes are most important in determining the outcome of the games. A good part of this includes analyzing continuous variables (e.g., “number of yards run at half time”) and determining meaningful threshold cutoffs. Through this information, we hope to be able to inform the Wildcat coaches about aspects of the game that the data suggests they should focus on the most during play.

As an example of a potential application of our data to the NU Football program, one attribute that could be the most influential attribute could be the difference in the number of turnovers between two teams playing, the coaches can focus on minimizing turnovers through training. For example, there is a feature within the data called run percentage which can be contrasted with pass percentage. This contrast, in the context of outcomes, allowed us to conclude that teams whose run percentage is greater than their pass percentage perform better. This suggests that coaches should focus on training and strategy that emphasize running with the ball rather than passing. Another route of investigation was the total impact of losing a ball to a team’s performance during a game – while this seems like an obvious finding, that having a ball intercepted is bad – our team’s research demonstrates the exact consequences to losing a ball and what it means for the remainder of the game.

We have derived fifteen attributes and aggregated data from 119 games for all of the BIG 10 teams between 2014 and 2018. Of these games, we have statistics for both teams, so our final dataset has 238 samples. For each of our models, we have been using 5-fold cross validation to see if the model suffers from overfitting and ensure accuracy. 

Our algorithmic approach has been to apply support-vector machine and decision tree models to the problem using this data. We have chosen to use these algorithms primarily because our goal is to do more than just predict the outcome of a game based off these stats, but rather diagnose which attributes are most influential on game outcome and find threshold cutoffs for these attributes. Our decision tree model will give us insight on which attributes are most important, and our support-vector machine will help us determine the thresholds. 

Our next step will be to focus on fine-tuning hyper parameters and testing new models as we have access to additional datasets through ProFootballFocus. The goal throughout this is to provide a model that is both predictive and explanatory with an eye towards supporting an advisory tool for the Wildcat coaches.