Engineering the (super)market

An aside before starting: I’m still working on what I hope to be some really interesting pieces on ‘AI’ in football, new data sources in football, and data in women’s football. If that sounds like your domain, or you have other topics you think Get Goalside should cover, please get in touch, to this email or getgoalside.newsletter@gmail.com

They’re taking a while to put together; the piece that follows is looser than I’d ideally like these newsletters to be but I hope there are parts of it that are useful to people.


If you watch the Premier League or WSL, you’ll have heard of Infinite Athlete. Which means it’s £40million well-spent to get on the front of Chelsea’s shirts.

The company say they want to be the ‘operating system for sports’: in a City AM interview, CEO Charlie Ebersol references Intel; another piece of promo uses the Google Maps API as a comparison. Both have bugged me, because I can’t quite get a grasp on what they mean, but to Infinite Athlete’s Infinite Credit, they’ve given me an excuse to think about metaphors for data.

Because I am, at best, two updates an update away from being completely superseded by an OpenAI model, I find metaphors useful as a sort of categorisation test. If you don’t quite have your head around something, you can toss up a bunch of metaphors and say “is it more like this or that” and work out where you are.

Although I don’t quite understand Infinite Athlete’s chosen metaphors (or whether comparing yourself to both Intel and Google Maps works), trying to come up with one that I can wrap my head around is interesting. And it’s hit upon one that feels like it has a wider usefulness.

I think that what Infinite Athlete are getting at is that they see sports data as a resource (like Intel did with electrical current or Google did with ‘where roads are’) which they can refine in some way (like Intel does through microprocessors or Google Maps does with data and API maintenance) to facilitate other things being created. They want to sync different data sources, all meshing nicely in one API. They want to sell you the convenience of your data in one place.

Like a supermarket?

It’s not a perfect analogy, but it provides a surprisingly good framework for football data as a wider landscape.

There are lots of different types of data that football organisations need. They come from all over. Some of it is even well-documented. But it’s a pain to traipse from market to market, picking up your event data groceries from one place and your physical data from another. And tracking data— you know that you should eat your greens, but frankly the fruit stall is up a hill and you don’t know what to do with pomegranates anyway. Your father never used tracking data and he turned out all right, scurvy or no scurvy.

Data companies, like specialist grocers, know that most customers will prefer either refined versions of the raw ingredients or some kind of ready-to-go product. Pre-sliced bread from the baker, pies from the butcher, stats portal from the data provider.

Sometimes there will even be crossover between grocers who’ve come to agreements between themselves. Deli meats and bread at the cheesemonger’s; Sportscode-ready data export options; a specific, strategic set of entity ID matching.

But wouldn’t it be easier if everything was just in one place?

It would. But that would also probably require a much different relationship between data collection companies and the people buying the data (or data products). The centralising party would become the main focus and contact point for the customer, and it’d probably benefit the ‘supermarket’ if customers could mix-and-match their purchases more easily. Like how the Apple ecosystem works, the pain of switching is a small reason for staying put. (And I should mention, data collection costs are high; you can understand why companies want to hang onto their customer base).

Something vaguely similar happens at football organisations beyond the pitch as well. The commercial operations also have a lot of data from a variety of sources to deal with: membership, tickets, merchandise, any other types of fan engagement data. Although the ‘football’ and ‘business’ datasets don’t need to mix with each other, the problems in both are similar.

Like with the footballing data, the cost of putting everything in the same place, let alone analysing it, is prohibitively high for many clubs. It was a point made multiple times on a recent episode of the Unofficial Partner podcast. “There are so, so many [clubs] and the vast majority of them are quite small businesses,” said Charlie Marshall, managing director of the European Clubs Association (ECA). “There is no economic sense whatsoever in hundreds and hundreds of small businesses all trying to build their own quite sophisticated data strategies, and the technology systems that support those data strategies.” [episode timestamp, ~1:01:00].

The discussion was about business data, and although the participants in that panel drew a line between that area and what happens on the pitch, the same is kind of true there too. It doesn’t make sense for every team to be doing very similar, boring things to build out their football data strategy. As some leagues have recognised: La Liga developed Mediacoach; the Premier League offered a combined feed of Stats Perform event data and Second Spectrum tracking data.

Those are quite wealthy leagues though, and even their solutions are a distance from a fully-fledged Sainsburys or Big Asda. Who’s helping all these clubs do their weekly shop?

You have Infinite Athlete, who have supermarket aspirations, but it remains to be seen whether they’ll be successful. You have data providers who, as discussed, often have their own product range alongside their basic ingredients. And now, increasingly, you have the big cloud services providers.

The cloud providers don’t appear to have direct supermarket aspirations, but they have all sorts of add-ons that can help manage and analyse the data once you’ve hooked it up. (Like a high street?). AWS has a bunch of promo about the sports organisations who use it. Both Databricks and Snowflake have similar content marketing you can read. Oracle, another cloud services provider, sponsor and service Red Bull’s Formula One team but don’t hold that against them too much.

The twin paths of sports’ increasing interest in data and sports’ continued rise as a business proposition has caught Big Data’s eye. With that comes Big Investment and Big Start-Up (Cerberus or the dragon meme, depending on your viewpoint).

However, the space that these new companies can exist in - as ‘third parties’ outside of football organisations or data providers - will differ depending on the landscape of Big Data Engineering. (Infinite Athlete, to briefly return to them, exists to try and be the solution to the Big Data Engineering problem, which itself still relies on a certain formation of the wider engineering landscape).

A question for Get Goalside readers: Which of the following is the more likely winner of the next three-to-five years?

  • Interoperability between data providers becomes seamless on its own, allowing for integration of different data sources within a provider’s own product, or allowing for foolproof entity matching between any provider to use data in third-party applications like Tableau
  • Organisations will turn to cloud providers like AWS for API integration and setting up data storage, either through some (semi-)automation (AI anyone??) or as an affordable managed service
  • The above, but provided by domestic leagues or national FAs
  • The scale of the task will have simply shrunk enough for clubs of all sizes to hire employees for the set-up and maintenance of data pipelines, and creation of internal tools
  • None of the above, it’ll be as complex as always
  • Something else

If data isn’t easy to access across providers and across ‘users’ (organisations or the outside products or services they might be interested in), then either data engineering will be part of the product/service or there’ll be annoying integration friction (or both). This has echoes in a related business area:

“Looking back and assessing where we are, there’s a clear winner in Enterprise AI. Bolt-on, good-enough ML, sitting besides data storage and data processing has proven to be the option the market wants and everything else trails by a wide margin. 

Standalone AI/ML-focused offerings plunge teams into an endless cycle of POCs and procurement woes, where our best technical minds spend their days not innovating, but rather, navigating customer problems and bolstering operational capacity.”

— ‘The Hidden Cost of AI as a Service’ blog post

Convenience is, well, convenient. You could put a price on it. Infinite Athlete are banking on that price being something worth paying (hopefully enough to make back that £40million they spent in shirt sponsorship).

But it also matters because an API is harder to deal with than potatoes. You can parboil some spuds, heat some oil in a pan, add some garlic and herbs and make a delicious side dish. Try doing anything as efficiently and easily delightful with a football data API as good roast potatoes. Football data has a steep learning curve and arguably an even steeper implementation curve.

Data gets talked up in regards to scouting as a big time-saver, but there are an awful lot of game model-related things that coaches and analysts code up themselves which you could almost certainly automate with tracking data. But that’s out of reach to implement for almost everybody.

I go back to what Charlie Marshall said again: “There are so, so many [clubs] and the vast majority of them are quite small businesses.” All of these small businesses are having to work out how to use data, but many will not be able to work out how to use it (on their own) unless they’re able to actually use it. But their ability to use it is limited by either engineering costs where no provider-produced product exists, or the confines of the provider-produced product where it does.

Whichever way you look at the situation, it’s both understandable and slightly weird.