The Stats Perform Pro Forum is the analytics conference for the dreamers. Started nearly a decade ago as a place for hobbyist stats-watchers to meet and present ideas, it's still fundamentally a party organised by a data provider who gift amateurs with interesting ideas the means to explore them.
Now though, it's an established part of the Forum that presenters have the chance to work with tracking data, with mentorship from an array of impressive names.
There is a page from the hosts, Stats Perform, about the presentations here and an overview from The Athletic here. I'm going to give an overview of my own here at the top, and then spend a bit more time on each presentation afterwards.
After three years of quality newslettering, you can now be a paid supporter of Get Goalside! More information is available here or you can go right ahead and support the newsletter below.
The best way to understand the Pro Forum is as a kind of advanced brainstorming meeting. Presenters only have a couple of months to do their work. This isn't like the Sloan Sports Analytics Conference in the US, where you'll get research papers representing a year's worth of plugging away at a problem. This is where young practitioners flex their wings and sparks of ideas take flight.
The theme I personally took away from this year's event was "the best kind of TV shopping channel". There were lots of small little data-gadgets that you could see being very useful, or ways of thinking about problems that were equally neat.
Things like Gerald Lim, Ashley See, and Chua Zhi Yuan's way of quantifying different types of counterpressing intensity. Or thinking about where to send your scouts as a problem that could be tackled with reinforcement learning, as Arnav Prasad did. Or taking the idea of distance run and using various key performance indicators (KPIs) to get a notion of 'efficient' runs, which formed the basis of Caterina De Bacco's presentation. The combination of shot-creating patterns of play and general pass-sequence clustering that was used while investigating defenders' play from Abhishek Mishra and Soumyajit Bose.
There was a clear increased specificity in a lot of the presentations too. Lim, See, and Chua's was focused on getting out of a counterpress; Bakr Annour and Silvio Matano looked at breaking out of a high press; and Ashwin Phatak, Henrick Biermann, and France-Georg Wieland had their attention on counter-attacks.
It probably makes sense to focus on very specific aspects of the game for the Forum, given the short timespan that presenters have to work on their projects. But regular readers of Get Goalside will know that I think having a sound theory of what football is helps you to investigate it with data. Focusing on specific aspects of the sport, I think, probably helps with that. At the very least, it stops you getting lost down agoraphobic dead ends.
Now for the presentation-by-presentation stuff. I'm limiting this to the on-stage talks and not including the poster presentations, partly for space, but mostly because I only made notes on the talks.
There'll be a summary for each that I'll crib from Stats Perform's intro to the presentation, and then some commentary from me.
Expected Counter: Probabilistic modelling of the occurrence and danger of counterattacks in soccer before they occur in real-time —Ashwin Phatak, Henrick Biermann, Franz-Georg Wieland
Summary: Aims to predict the success of a possible counter-attack and its potential danger to the defending team, measured by a new metric: Expected Counter.
Commentary: 'Counter' was defined here as a move starting 70 metres away from the opposition goal and ending within 30m it, making the journey in under 30 seconds. Don't get too hung up on that if you disagree with it though, I don't think things would necessarily change much if, for example, any of those boundaries were tweaked.
There were some neat observations about where counters were more likely to spring from, in terms of where the turnovers were. Within the pre-defined 'start of counter' area, the central area closest to the halfway line was the most likely part. That might be unsurprising, but there was a marked drop-off between the centre of that area and the wings at the same height of the pitch.
As well as the locations, the researchers looked at whether the numbers of attackers vs defenders made a difference (it did) and whether the compactness of the team being countered against made a difference (it also did).
This all might appear to be expected, but I think the important thing is the relative differences. With this kind of work, you'd have a better way of gauging the defensive value of keeping a player further back in possession to defend against counters compared to the cost that might have in attack, for example.
This particular type of work could also be done with freezeframe data (a snapshot of tracking data for each on-ball action), rather than the full-fat full-match tracking data that this used. That'd probably save on some computing power (by machine or person).
Identifying ways to efficiently break a high press — Bakr Annour, Silvio Matano
Summary: Applying event and tracking data to generate formation and player recommendations based on the pressing styles adopted by an upcoming opponent. The presentation outlined how sequences of interest are identified and how their success is defined, explained how their model was used to 'place them' in a common space, and how the backend of a hypothetical application would interact with the model.
Commentary: A lot of my favourite presentations at Forums in the past have been the ones that edge into being tech talks, and this was one of them.
Identifying typologies and similar situations of high presses with tracking data seems tough. Because as well as just looking at, say, the height of the press, there's all sorts of other factors you need to account for to make the similarity seem right.
Although examples in presentations can always be cherrypicked, their results certainly seemed plausible. There was also an interesting piece of insight that occupying the centre of the pitch close to the halfway line appeared to be an important factor in a team breaking through a high press.
Measuring KPIs for scoring efficient runs — Caterina De Bacco
Summary: From a defensive standpoint, the model labeled runs based on whether they resulted in attacking players being marked or closed down, or interrupted a potential passing line. In attack, runs were credited based on whether they generated a passing opportunity, attacked vacant space, or triggered movement from an opposition player which in turn created space for a teammate.
Commentary: This presentation combined two things I really like. One was taking an existing fixation of football (distance run) and thinking about it in a smart way. One of the outputs of these models were literally graphs showing a player's distance run throughout a match and their 'effective' distance run (i.e. the amount they ran while doing good things). You could, as De Bacco did, easily create a ratio with these figures and compare players based on what proportion of their running was 'effective'.
The other thing is approaching the problem of football in a smart way and breaking it down into component parts. As with most things in analytics, you could quibble around the edges of the models that De Bacco created, and she pointed out some areas for improvement in them herself. But individually they all made sense, and collectively they covered most things in football.
I think that this presentation was a great example of how thinking about football smartly can really help produce good analytics work.
A reinforcement learning approach to scout allocation and talent discovery in football — Arnav Prasad
Summary: The presentation demonstrated how clubs can build and test various algorithms to inform their own strategic decision-making, in this case with how they allocate their scouts.
Commentary: Reinforcement learning is one of those things that is surprisingly like what it says on the tin. You give a computer a problem and it updates as it goes along; the results reinforce the learning.
Most 'analytics' work focuses on what happens on the pitch, but it's neat seeing things that go beyond that. Crucially, this presentation recognised that scouting operations sometimes have different aims: some want to uncover gems, potentially at risk of certain quality; others don't mind about missing the unexpected upside, preferring the certainty.
If you created values for how much talent you thought might be in each scoutable region, and then used scout reports to update that, perhaps this type of approach could be useful.
Measuring and modelling defensive efficiency with only event data — Abhishek Amol Mishra, Soumyajit Bose
Summary: Evaluated ways of judging the defensive ability of centre-backs with event data, with possession-adjusted existing metrics and new metrics focusing on opposition goal, shot, and threat prevention.
Commentary: Yes, I have a predisposition to like any presentation focusing on not just defending, but centre-backs. I will try not to let that skew me. Unfortunately my notes on this were also not as thorough as they might have been.
There was a broad collection of metrics which (somewhat similarly to De Bacco's presentation) focused on different areas/types of defending.
There were convex hull-based stats (reminiscent of PATCH, although a fresh spin on it). How many passes went into the area a player is usually responsible for defending, and how many attempted passes into it were actually completed.
Other metrics focused on clustering the paths that opponents created chances through. One awkward thing about quantifying defending is that you're often trying to work out some kind of counterfactual: if a defender makes a defensive action, we have no real knowledge of what they stopped from happening.
The clustering technique seems a pretty innovative way of trying to get around this. It's kind of like constructing a plausible 'what if', in a different way to an expected possession value model.
In the results, different defenders tended to do well at different metrics, again highlighting the worth in splitting things up into different components.
Pressing Times: Can data tell us when and how to navigate out of a counter press? — Gerald Lim, Ashley See, Zhi Yua Chua
Summary: Quantified counter-pressing strategies through the application of different metrics, the compute the successes of various decisions taken against the counterpress based on key outcomes like ball progression, retention time, and the threat conceded from losing possession.
Commentary: This was another presentation where thinking about the problem smartly, and not being afraid to spend time breaking it down, worked wonders.
The researchers used a pitch control model to quantify the intensity of the counterpress, but they had a few different methods for it. There are different kinds of pressing strategies — ball-oriented, player-oriented, passing lane-oriented — and each of them got their own approach and metric. This alone has potential value in analysis.
They then looked into different methods that a team could use to get around a counterpress: clearances, long passes, short passing, that sort of thing.
There was a point, almost made in passing, that I thought was interesting about 'scouting' for the national team. I know that people have talked before about players from the same club side making for good national teammates, but not every nation is turn-of-the-2010-decade Spain. The point the researchers made here was that you could use these metrics to look for players who played similar counterpressing styles for their club side.
As I said at the start of this, the best way to understand the Pro Forum is as a kind of advanced brainstorming meeting. This was a particularly good one.
If you've enjoyed this newsletter, and the work that Get Goalside has done over the last three years, you can now become a paid supporter. There's more information here, or you can hit the button below.
Read more posts like this in your inbox
Subscribe to the newsletter