Research in Focus: Deep reinforcement learning decision-making

Pegah Rahimian, Jan Van Haaren, Togzhan Abzhanova, Laszlo Toka, 'Beyond action valuation: A deep reinforcement learning framework for optimizing player decisions in soccer'

[Presented at the 2022 MIT Sloan Sports Analytics Conference (link)]

Why it's worth your time

Research into decision-making is always interesting, and is good to be aware of if you care specifically about on-pitch applications of analytics.

It also uses reinforcement learning to model the possible decisions taken, which is interesting. This means the computer is given some pointers about how to succeed and tries to find an optimal strategy, rather than looking at past events and saying 'this is what has worked before'.


What it says

The researchers create two 'pitch surface' models*: one for the likelihood of the next action’s location (based on teams' behaviours) and another for the likelihood of a successful action to that location.


*literally, a model that can be applied to each inch of the surface of the pitch. The probability of the next location's action can be displayed as a heatmap over the field.


The reinforcement learning model needs 'reward functions' to help it learn whether a move is good or bad. Obviously winning matches, and scoring goals, are good, but they're infrequent. The researchers created different reward functions for different phases of play (e.g. the transition phase or the build-up phase). They use a policy gradient algorithm to help the model maximize learning for scoring goals.

This gives the researchers an 'optimal policy' (i.e. decision of where to move the ball) which they can compare to the 'behavioural policy' (what actually happened).

As well as calculating this optimal policy based on the league as a whole (the 2020/21 and early part of the 2021/22 Belgian league seasons in their case), they did so for individual teams as well to account for team styles. This was done by running step 1, the pitch surface models, for each team.


Why I think it's cool

Reinforcement learning seems cool. It's how computers learn to beat humans at games like chess and Go so, yeah, maybe it could help with football too.

The use of reinforcement learning may have forced this on the researchers, but I also really like the nod to different phases of play too. It makes a lot of sense that actions in different phases of play have different sets of incentives. And it's not just the phases of play that caused reward functions to differ either: for the build-up phase it depended on the scoreline too, for example.

I'm not in a position to be able to assess how much this is a 'better' approach than alternatives, in terms of predicting the value of a footballing action. For example, I haven't managed to get my head around what would happen if a team should simply change their style of play (put another way, how optimal is the 'optimal policy' of a bad strategy?). However, the general approach certainly feels like it captures something inherently real about football. Which I think, or at least hope, means it's on the right path.



'Research in Focus' is like SparkNotes for football analytics: summarising and analysing the best research out there. Get Goalside supporters get access to every post, with a rotating selection free to access for all. Follow this link for the list of all Research in Focus pieces.