Spending a Sunday with pressing stats

This past Friday was the StatsBomb Innovation in Football conference (I think I’ve remembered the official name right…), a fun day at Stamford Bridge listening to people who’ve spent a lot of time with data. There was me and my colleagues from Twenty3 delivering a talk on how to break down a set defence; there was Ajax Head of Sport Science Vosse de Boode delivering a superb presentation about all the stuff they’re doing over there; there was Thom Lawrence from StatsBomb having an, amusing and technical, footballing existential crisis of sorts. All the good stuff.

Michael Caley was also there, and his talk was on pressing. It was a really thorough and interesting overview of the different methods that have been used in the public analytics sphere over the years, and it made me want to get my teeth into the problem. (All of the talks from the conference will go up online, and I highly recommend checking them out when they do. If you’re interested in the talk I was involved with on breaking down a set defence, we have a proper paper that will be going online too).

And so I spent much of my weekend making notes on pressing concepts and data types and going around in circles of investigation before arriving on a method that I’d played around with last month. Let me take you on that journey.

First, let’s tackle the conceptual problems that surround trying to measure pressure in event data (for ‘event data’, think of a row in a spreadsheet for every pass, tackle, shot, etc).

There’s the obvious one that it tells you barely anything about where the defensive team actually is. StatsBomb’s data has ‘pressure’ events as well as things like tackles and interceptions, but that still relies on some kind of action on the part of defensive players.

This is a problem because, like with the famous story of ‘missing’ bullet holes in returning warplanes, teams in possession will pass around the opposing defensive block. And because they pass around them and pass away from potential pressure and risk, there usually isn’t a lot of direct evidence of where defending teams are positioned.

A similar-ish ‘missing data’ problem is that teams who are successful in a high press or high block will stop opponents before they get down the field, and that could have an effect on statistics. A high percentage of a successful high-pressing team’s defensive actions will be high up the field not only because their aim is to force defensive actions in those locations, but also because by doing so they limit the amount of actions they will ever need to make further back.

And then we have a bunch of more footballing conceptual problems.

One is that you need to separate ‘pressing’ from ‘defensive block height’. They’re two different, but related, things and if you’re looking to quantify one of them you should be aware of the other.

One is that you need to separate ‘process’ from ‘result’ (or at least be aware of the interaction between the two). A stat like ‘high turnovers’ is purely about the result, but other metrics may be looking at aspects of both. This will be more of a problem if one is looking for ways to quantify process or intention, as one’ll need to be aware of how much one’s actually just seeing ‘result’ showing up.

And a related problem is that there are two things to want to measure as far as pressing goes: type and efficacy. It’s important to really think about what one’s measuring and what the results may yield, because a metric designed to look at type of press may be being heavily influenced by the efficacy (or lack of it).

That’s a lot of things to be aware of, and it’s likely that any metric that one puts together is going to run into several of these. Nothing’s going to be perfect. But you can try and be as good as possible.

Because of all of these problems, I think that a lot of the existing metrics — which are mostly quite broad proxies — have an advantage stemming from exactly that broadness.

Colin Trainor’s original PPDA (opposing Passes Per Defensive Actions) is really cool, and the concept can be applied to more specific areas of the pitch. It kind of measures process as well as result (harrying the opponent high up the field — process; but it relies on making defensive actions — result), but it’s more than fine as a broad look.

Opposition pass completion percentage is similar, and is something that I’ve used before, and taking it a step further by only looking at passes going forwards is something that Paul Riley has done. He limited the area he looked at to the ‘second box’ (the 18 yards forwards from the 18-yard box), thereby looking at a more specific type of press.

Anyway. Inspired by Caley’s talk, I wanted to tackle ‘pressing’ more systematically, thinking about the best metrics to use for each of the problems that I’ve talked about so far.

I figured that StatsBomb’s data, with their pressure events, would help. I reasoned that a defensive block could perhaps be identified by players receiving the ball under pressure.

Ah, but wait. Surely it would have to be passes towards goal? Players being pressured when receiving backwards passes would surely be being ‘pressed’ or closed down in a more active sense rather than being put under pressure by a defensive block.

Ok, so let’s do that.

Ah, but wait.

Below is a plot of under pressure moments following a player receiving a pass that was going towards goal. It shows Chelsea pressure from a Chelsea vs Manchester City WSL match in 2018/19, but the point is that massive area of nothing in the middle of the field (Chelsea are defending the goal to the right).

That doesn’t mean that Manchester City were playing freakishly directly that match or that Chelsea’s midfield was a sieve, but that this is the nature of the sport.

So that was my initial ‘defensive block height’ idea out of the window. I realised, after several hours of work, that part of the way that a defensive block manifests itself is in the way it forces teams to try and pass around it. (Now that I type that, it seems very obvious).

Back to the drawing board…

Fortunately, I had another idea stowed away which, thinking about it, is pretty similar to Riley’s.

About a month ago, I’d wanted to check out the idea of ‘buffer zones’ — the heights of the pitch where teams struggled to complete passes towards goal. Flipping that gives some gauge of defensive height.

It looks a little like this*, and once again teams are defending the goal on the right. Red means they’re harder to pass through in that area, green is easier than league average:

*this is a quick mock-up I did when I first looked at this, so the data’s out of date by a few weeks now.

One downside of this is that it doesn’t separate ‘process’ from ‘results’ of the defensive block as much as I would like, nor necessarily ‘pressing’ from ‘block’. A team that attempts a high block but is bad at it will show up as easy to pass through in a similar way to teams that just attempt a low block and cede some of the midfield to their opponents.

However, one part of this that I do like is that it can give a sense of different line heights. Some teams have one band where they’re slowing down their opponents and then another further back. Some are league average everywhere (which, actually, isn’t really what I wanted when I started looking for a gauge of ‘defensive block height’ but ah well).

I guess what I want to get across is some of my thought process about pressing stats that stemmed from Caley’s talk on Friday. The journey did not exactly come to a satisfactory ending. But it was a relatively interesting journey.

Enjoy this? Subscribe to the newsletter:

Already subscribed? Spread the word: