Football Analytics
A personal project where I develop football models and visualizations, combining machine learning with creative data storytelling to reveal insights hidden in match data.
Overview
Football analytics has exploded in recent years, with clubs investing heavily in data science to gain competitive advantages. This project represents my personal exploration of the field, combining machine learning models with creative data visualizations to extract meaningful insights from match data.
The repository showcases various approaches to understanding football performance—from expected goal (xG) models that quantify shot quality, to network analysis that reveals team passing patterns and player relationships on the pitch.
Technology Stack
The project is built on a modern data stack, orchestrated through Docker for easy deployment and reproducibility:
Python
Data processing and machine learning using TensorFlow, Scikit-Learn, and MLFlow for experiment tracking and model versioning.
R
All data visualization work leveraging dplyr for data manipulation, ggplot2 for plotting, and magick for image processing.
PostgreSQL
Storing and querying match data efficiently, enabling complex analytical queries across seasons and competitions.
Docker & GCP
Container orchestration for development environment and Google Cloud Platform for running computationally intensive model training.
Visualizations
The heart of this project lies in creating compelling visualizations that make complex data accessible. Each visualization type serves a specific analytical purpose:
Pass Networks
Network diagrams showing passing relationships between players, revealing team structure, key playmakers, and how the ball flows during matches.
Pass Sonars
Radial visualizations showing the direction and frequency of a player's passes, revealing playing style and preferred passing patterns.
xG Maps
Shot maps colored by expected goal value, showing not just where players shoot from but the quality of chances they create and convert.
Assist-Shot Cluster Maps
Spatial analysis showing where teams create chances from, identifying patterns in attacking play and dangerous zones for opponents.
Performance Rolling Means
Time series visualizations tracking team performance metrics over the season, smoothing out match-to-match variance to reveal trends.
Machine Learning Models
Beyond visualization, the project includes several predictive models:
- •Expected Goal (xG) Model: Predicting the probability of a shot resulting in a goal based on location, angle, body part, and game context.
- •Expected Assist (xA) Model: Quantifying the quality of chances created by passes, independent of whether the recipient scores.
- •Possession2Vec Model: Embedding passing sequences into vector space to find similar patterns and cluster playing styles.
Project Architecture
The repository is organized into modular components:
Impact & Learning
This project represents a deep dive into sports analytics, combining several disciplines: statistical modeling, machine learning, data visualization, and domain expertise in football.
The visualizations have been shared on Twitter/X, sparking conversations with other football analytics enthusiasts and contributing to the growing open-source community around sports data science.
Key learnings include the importance of domain knowledge in feature engineering, the power of visualization for communicating complex analysis, and the challenges of working with real-world sports data that is often incomplete or inconsistent.
Tools & Technologies
Explore the code
Check out the full repository with models and visualization code.