23 Jul 2024 13 min read

Understand football and you'll understand AI

More than most Get Goalside pieces, this is written with a wide audience in mind. Also more than most Get Goalside pieces, it owes a lot to those who’ve helped in its writing process. Their assistance is far greater than the digital ink of direct quotes. Any mistakes are mine.

Jokes are mine if they land.

—

The famous Bill Shankly line is that football is a simple game complicated by idiots. AI is a simple concept complicated by almost everybody. What happens when the two come together?

Most versions of a ‘football and AI’ article would be about the who-s and the what-s of this collision, and there’ll be a bit of that here, don’t worry. But what’s really important are the how-s and the why-s.

If you’re looking for ‘X team is the leader in artificial intelligence’, you’ll be disappointed by this piece; but if you want to understand how AI itself works, and why the basic mechanics of it are a lot like football, you’re in luck.

What’s in a name?

To begin with, let’s leave the term ‘AI’ at the door. It’s more of a brand name than anything at the moment, but maybe that’s always been the case. Since the 1970s there have been variations on the following quip flying around: “As soon as it works, no one calls it AI any more.”

Instead, we’ll talk about ‘machine learning’. It’s a more literal term, involving no philosophy about what ‘intelligence’ really is. You train a footballer to master a skill; you train a machine learning model to complete a particular task.

Simple. Bill Shankly may well approve.

Goals and goals

There will be one and only one piece of hair-splitting in this article. It’s this: on the face of it, football is a sport about scoring more than your opponent; but to be specific there are two parts to this, scoring more and conceding fewer. If you don’t believe that there’s a difference, ask a fan of Tottenham Hotspur men’s team.

Whole schools of tactical thought, from Ange Postecoglou to Jose Mourinho, are based on the idea that scoring more or conceding fewer is either more important or achievable than the other. Being specific about objectives is important.

For example, you can imagine how a TacticalTheoryBot might reach a different conclusion if the ultimate objective is ‘avoid conceding’ or ‘score quickly’. If the latter, you might end up with something like route one football. If the former, avoiding conceding, you might end up with something like ‘defensive possession’, a more modern term for an old concept of killing a game off by playing keep-ball.

However, whether your objective leads to you lumping it long or passing it short, both footballers and machine learning modellers have to deal with the very low-scoring nature of football. In both cases, they need some kind of signal that they're learning correctly and whether what they're doing is good or bad.

In some cases it might work just to switch ‘goals’ for ‘shots’. But in some, you’ve got to change your whole mindset.

Learning styles

We’ve said already that you ‘train’ a machine learning model, the same word we use for footballers. Like with humans, there are different methods of teaching and learning too. (As a tangent, I’d love to hear from education specialists on what they think about ‘learning styles’).

In machine learning, there are some broad categories that different techniques can fall into. There’s supervised learning, where each bit of training has a label attached, like a tick or a cross; there’s unsupervised learning, which is more about pattern-finding or grouping; and there’s reinforcement learning, which is a bit more like human trial and error.

For some examples: creating an expected goals model would be a case of supervised learning, with the model being trained by ‘goal or no goal’ info for each shot. Examples of unsupervised learning would be anything on ‘clustering’, like this or this from The Athletic. However, for other parts of football, reinforcement learning can have its advantages.

Dr Pegah Rahimian is a football data scientist at Twelve Football and post-doctoral researcher at Uppsala University (PhD in Football Data Science). One of the research papers she’s worked on (a Get Goalside favourite) used reinforcement learning to evaluate choices that teams make in spells of possession. She explains: “We cannot immediately see if, for instance, a backward pass on the left wing in a particular game state is good or bad in terms of increasing the chance of scoring at the end of a possession. However, we can design some short-term rewards for that pass which depends on our modelling approach.

“For example, whether a pass was successful or a turnover, the expected threat value of the pass, et cetera. Having these short-term rewards and sequential nature of the game, we can model it as a Markov Decision Process [a mathematical framework] and use reinforcement learning to get an optimal action.”

As a general concept, reinforcement learning is similar to human trial and error. The model gets nudged towards better outcomes, but without needing a definitive ‘right’ and ‘wrong’ answer. When training footballers, you can’t give them a list of situations and a ‘right’ and ‘wrong’ thing to do. But you can set them training drills, maybe focused on specific areas of the game, where a points system can give them some indication of what was better or worse.

There’s probably even a machine learning version of a rondo.

But if there is, it’ll involve data, not footballs. Because, depressing as it may sound, there are no sports in the world of a machine learning model. Only maths.

Knowing the game

“Data science algorithms in general are just sets of rules,” says Maia Trower, a football analytics consultant and PhD student in data science at the university of Edinburgh. “Computers allow us to implement complicated algorithms on huge amounts of data to get solutions to problems that weren’t previously possible.” David Sumpter, professor of applied mathematics at Uppsala University and co-founder of Twelve football, said something very similar when speaking for this newsletter: "Machine learning is mostly statistics and fitting curves to data.”

To see what they mean, let’s take an example many people will have come across at school, the ‘line of best fit’ to a set of data. The maths here is quite simple: you draw a line, then add up how far each datapoint is from it; then you draw another line and do the same sums and see if it’s closer. It’s possible to do that as a human over and over again until you find the line of best fit, but it’s a task that computers can do in a flash.

The ’maths’ behind the algorithms isn’t always as simple as this, but the point remains: the skill of doing machine learning isn’t all about fancy equations. When grumpy football coaches or pundits say ‘you have to know the game’ they do sometimes have a point, and the same is very true with machine learning.

"Many algorithms come with a set of assumptions and caveats, and results or properties that are true if those assumptions are met,” Trower explains. Although the algorithms might just be ‘a set of rules’, “someone has to develop the rules, and someone also has to prove that they work and how well they work.” Which is where the professors and PhDs come in.

This is why they make you study maths at school, really. Someone’s gotta grow up and understand the maths that underpins all of the data science in our lives, even if most of us will only be consumers of the tech that makes use of it.

One thing school doesn’t force us to learn, though, is football theory. That’s a problem for football data science. The best machine learning in football will always be done by the groups of people who understand both areas. When speaking with David Sumpter for this piece, he noted that “lots of things we observe in football are not in the data […] So I would describe the decision of the practitioner who works only from data as throwing away everything we know about football before we start!” (a complexity which he writes about further here).

Knowledge, it turns out, is important.

ChatGPT can make mistakes. Check important info.

(header taken from message displayed when beginning a fresh ChatGPT session).

Although this post has hopefully been readable so far, you really have to be applauded for going through so many facets of machine learning - the goal-setting, the ways of teaching a model whether they’re on the right track, the value of maths - without even getting a mention of ChatGPT yet.

It’s a genuinely fascinating tool. ‘Tool’ is an important word though, because behind ‘ChatGPT’ is the ‘GPT’ model itself (a set of initials we’ll explain soon). Like all machine learning models, we can apply the previous sections of this post to it.

The goal of a GPT model - part of a wider family of ‘large language models’ (LLMs) - is basically to predict the most relevant next word in a sentence. “Essentially, LLMs are advanced probabilistic auto-complete systems,” as Dr Ryan Beal, CEO & co-founder of SentientSports, puts it.

When humans speak, we usually have some kind of thought in mind, even just a vague one, that we then put into words. The large language models are putting one word in front of the next based purely on what has just been said. This is where ‘hallucination’ comes from: it’s a prediction bad enough to stick out like a sore thumb (or, more worryingly, not to stick out like a sore thumb even when it’s wrong).

This prediction of the next word is also how they’re trained, and explains why so much data is needed by the companies which create them. A model which is built on the relationships between words will always need to be able to examine those relationships.

“Beyond technical advancements, significant progress [in LLMs] has been driven by substantial financial investments in training these models,” says Beal. “Training GPT-4, for instance, reportedly cost around $100 million. Thus, the combination of innovative architectures and large-scale financial investments has been instrumental in achieving the current capabilities of models like ChatGPT.”

There may well be some financial parallels to football. I prefer not to speak.

But yes: alongside the heft of money bags and illegal data scraping there are genuine technical expertise in these models - the ‘GPT’ in ‘ChatGPT’.

GPT stands for ‘Generative Pre-trained Transformer’. A ‘transformer’ in this case is a type of machine learning architecture, a very sophisticated ‘set of rules’ for running data through a model. As with football tactics, these approaches have evolved over time, with the big leaps forward usually having pre-cursors that the older heads would recognise well.

To go back to the ‘machine learning model as player education’ metaphor, an ‘architecture’ of a model is like the intricacies of a pressing strategy and the teaching methods used to get a player to do it right.

“Transformers have been pivotal in analysing and generating text by understanding the contextual relationships between words,” Beal says, “enhancing tasks like translation, summarisation, and text generation through self-attention mechanisms that process sequences simultaneously.” Self-attention. Sequences. We don’t even need to return to the ‘training players’ metaphor to apply this to football.

Connecting the lines

The choices about which machine learning approach and architecture you use touch on the end goals, touch on the type of ‘mathematical problem’ it might be similar to, touch on the type of data that you have. They can even extend to how you arrange that data. Fantastically, football is a perfect sport for understanding this.

‘Tracking data’ gives the positions of the players and the ball in a way that looks like the top-down pitch map views on Football Manager or FIFA. You can view this data as an image, or you can view it as connections between points.

"If you look at tracking data frames, you can look at the players as one graph," explains Amod Sahasrabudhe, a machine learning engineer at Gemini Sports Analytics. A 'graph' in this case is another mathematical term; a collection of 'nodes' (think of circles) joined together by 'edges' (think of lines). "If the image representation allows the model to look at a 2D representation (of the position of the players and ball), the graph allows you to encode additional information like velocity of the player, acceleration, distance from each other, et cetera during the play."

Isn’t that fun? It’s like a real Transformer - the Optimus Prime kind, not the machine learning architecture - where the same thing can be configured differently for different uses. Some information will be able to be picked up by both types of data lay-out, but others might be easier to work with in one form or the other. “For example, if there are two nodes in your graph that are connected, you could pass in the distance from each node along with the angle created with each other,” Sahasrabudhe says. “You can also represent the velocity difference, acceleration difference which I think an image based representation might not be able to interpret.”

Let’s take a brief pause.

In a second we’re going to drop ‘neural networks’ in, and we’re not going to break stride when we get there. The pause that we’re taking is not because we need a run up, but because we don’t need one. Life is full of complex things that everyone grasps the gist of (gravity, evaporation, Spursiness). There’s no reason at all why machine learning shouldn’t be one of them.

Just like how training drills can teach footballers to improve their technical or tactical play, training a machine learning model involves repeating a task until it starts achieving the outcome you’re after. Picking that outcome might influence the approach you take. The approach will involve a machine learning ‘architecture’, much like deciding on the details of a training session. A neural network is a broad category of architecture (based on ideas about how the brain works with lots of ‘nodes’ which interact with each other). The nodes in a neural network do maths, which isn’t necessarily complicated when looking at one single node, but becomes incredibly powerful when scaled up, and lets the model learn complicated, non-linear relationships. Modern computing power lets us do that to a huge degree.

And ‘graph neural networks’ are a version of neural network where the data going into the model is arranged as a ‘graph’, a concept that basically looks like a football passing network. These ‘GNNs’ are also almost everywhere in the modern world.

"Graph Neural Networks excel in tasks involving complex relational data, such as social network analysis and recommendation systems, finding patterns that traditional neural networks might miss," says Dr Ryan Beal. And after barely 2500 words of this post you’ll understand a summary of a research paper Beal co-authored, which used graph neural networks to help estimate player locations purely from event data (a type of data which usually only has player locations for the person making each ‘event’, like a pass or shot).

Fun, huh. And we’re almost at the end.

Being specific about objectives, choosing a data format, choosing a machine learning framework. It’s all in the service of achieving something. Here’s Maia Trower again, on what a data scientist might be thinking about when coming to a data science or machine learning task:

“When choosing an approach, I’d say the three things I’d think about are: One - what is the exact problem or question, and how will I evaluate whether or not I’ve solved or answered it? Two - what data do I have access to? Three - what models do I already know about that might be appropriate?”

Before the recent wave of generative AI buzz, data scientists were already applying these questions to techniques like regression models and neural networks. Now there’s a new type of tool in the arsenal.

The right tool for the job

Talk to a smart mathematician or data scientist and they’ll tell you a lot of smart stuff about maths and statistics. But talk to a really smart one, and they’ll tell you when all of that won’t be necessary.

Towards the start of this post, we talked about ‘defensive possession’. Sometimes the obvious objective (have the ball = score a goal) isn’t the real objective (avoid conceding). Similarly, creating the most sophisticated machine learning model might not be the true aim. It’ll usually be to help a coach or player improve something. Those two things might sound like the same thing, but they can be as different as Angeball and Mourinhoball.

The fact that something is a ‘tool’ is important. It’s important in ChatGPT, where OpenAI clearly built in a lot of safety rail-type features. Well-known examples are around its refusal to use slurs or give out bomb-making recipes. Guardrails could also help the ‘hallucination’ problem, but won’t be enough.

Remember when we said that humans turn thoughts into words but ChatGPT just lays them one after another based on probability? A technique called ‘retrieval augmented generation’ (RAG) adds a little bit of actual structured ‘thought’ in and around the language models.

To re-phrase the syllable-heavy term, the process augments (improves) the generative language model by retrieving some information (from a database).

This way, new information doesn’t need to enter into the large language model’s training data, where it would need to muscle past out-of-date information in the melee of probabilities.

When speaking with Dr Ryan Beal for this newsletter, he pointed out that the GPT3.5 model underpinning ChatGPT still thought (at the time) that Harry Kane played for Tottenham (he’d moved to Bayern Munich several months earlier). For the language model to talk about Kane being at the right club, it would need to be re-trained with all its data. A retrieval augmented generation system, on the other hand, would fetch the data and then use the language model like an interface or conduit for the updated information.

David Sumpter has an interesting framing of this kind of language model usage. “When we use LLMs, we don’t use them to model the world,” he says. “We use well-defined statistical models and mechanistic models. Instead we use the LLMs to communicate about our models.

“You should think of it like a visualisation, like a shot map. A visualisation does not tell you the single truth, but is a reasonable representation of the data. We use the LLM in the same way, to get a reasonable representation of the data.”

The who-s and what-s

Some of the who-s of machine learning in football are, of course, the contributors to this piece. There are others too, and the biggest English clubs are still steadily building up departments for it.

Money doesn’t guarantee good work, though. Ultimately, the real who-s and what-s will simply follow on from the how-s and why-s. The best work will be done by those combining the best domain knowledge (how) with the best questions (why).

Plus a little bit of creativity. “Skill without imagination is barren,” wrote Walter Isaacson about Leonardo da Vinci.

Skill and imagination. That’s basically all there is to it.

Many thanks to Dr Ryan Beal, Dr Pegah Rahimian, Amod Sahasrabudhe, David Sumpter, and Maia Trower for their help in writing this piece