Back to Projects
2019-2020Data Engineering & Marketing Technology

Customer Data Platform

A comprehensive Customer Data Platform built on BigQuery and Airflow, enabling ML-powered audience segmentation for acquisition and CRM teams. Transforming customer data into actionable marketing segments.

BigQueryAirflowGCPMachine LearningCDPData Engineering
Maison du Monde Customer Data Platform

Overview

In the competitive e-commerce landscape, understanding and activating customer data is crucial for effective marketing. Maison du Monde needed a unified Customer Data Platform (CDP) to consolidate customer information from multiple sources, create intelligent audience segments, and activate them across acquisition and CRM channels.

This project built a complete CDP infrastructure on Google Cloud Platform, leveraging BigQuery for data storage and transformation, Airflow for orchestration, and machine learning models for intelligent audience segmentation. The platform enabled marketing teams to select pre-defined segments and push them directly to advertising platforms like Facebook Ads, Google Ads, Criteo, and CRM systems.

Architecture

The CDP was built on a modern data stack with three main layers:

Data Sources (BigQuery)
Airflow Orchestration
Data Transformation & ML Models
CDP Segments (BigQuery Tables)
Activation Platforms
• Facebook Ads
• Google Ads
• Criteo
• CRM (Selligent)

Data Ingestion

All customer data was consolidated in BigQuery from various sources: e-commerce transactions, website interactions, customer service interactions, and marketing touchpoints. Airflow orchestrated the ingestion pipelines to ensure data freshness and reliability.

Data Transformation

Complex SQL transformations in BigQuery cleaned, enriched, and unified customer profiles. Data was processed to create a single source of truth for customer attributes, behaviors, and preferences.

ML-Powered Segmentation

Machine learning models analyzed customer behavior patterns to create intelligent segments: purchase propensity, churn risk, product affinity, lifetime value predictions, and more.

Audience Activation

Segments were materialized as BigQuery tables and automatically synced to advertising platforms and CRM systems, enabling marketing teams to activate audiences with a single click.

Data Pipeline with Airflow & BigQuery

Apache Airflow orchestrated the entire data pipeline, ensuring reliable and scheduled execution:

  • Scheduled Ingestion: Daily workflows pulled customer data from source systems into BigQuery, handling incremental updates and full refreshes as needed
  • Data Transformation: SQL-based transformations in BigQuery unified customer profiles, calculated metrics, and prepared data for ML model training
  • ML Model Execution: Machine learning models ran on BigQuery ML or external services, generating predictions and segment assignments for each customer
  • Segment Materialization: Final audience segments were written to dedicated BigQuery tables, ready for activation by marketing teams
  • Error Handling & Monitoring: Airflow provided comprehensive monitoring, alerting, and retry logic to ensure pipeline reliability

Machine Learning Segmentation

The CDP leveraged machine learning models to create intelligent, data-driven audience segments:

Purchase Propensity Models

Predicted likelihood of customers to make purchases within specific time windows, enabling acquisition teams to target high-intent audiences for retargeting campaigns.

Churn Risk Segmentation

Identified customers at risk of churning, allowing CRM teams to engage with retention campaigns and win-back strategies before customers were lost.

Product Affinity Clustering

Grouped customers by product preferences and purchase patterns, enabling personalized product recommendations and category-specific campaigns.

Lifetime Value Prediction

Estimated customer lifetime value to optimize acquisition spend and prioritize high-value customer segments for premium campaigns.

Activation Platforms

The CDP integrated with multiple marketing platforms, enabling teams to activate segments across acquisition and CRM channels:

Acquisition Platforms

  • Facebook Ads: Custom audiences for retargeting and lookalike campaigns
  • Google Ads: Customer match lists for search and display campaigns
  • Criteo: Dynamic retargeting audiences based on browsing behavior

CRM Platform

  • Selligent: Segmented email lists for personalized messaging and CRM campaigns
  • Lifecycle Marketing: Automated campaigns based on customer journey stage
  • Retention Programs: Targeted campaigns for at-risk segments

Marketing teams could select pre-defined segments from a catalog and push them to their respective platforms with a single action, dramatically reducing the time from insight to campaign activation.

User Experience for Marketing Teams

The CDP was designed to be accessible to non-technical marketing teams:

Segment Catalog

Teams could browse a catalog of pre-defined segments, each with clear descriptions of the audience characteristics and use cases. Segments were organized by marketing objective (acquisition, retention, upsell, etc.) and updated automatically as ML models retrained.

One-Click Activation

Once a segment was selected, teams could push it to their target platform (Facebook Ads, Google Ads, Criteo, or CRM) with a single click. The system handled all the technical complexity of API integration, data formatting, and synchronization.

Impact & Results

The CDP transformed how Maison du Monde approached customer data and marketing activation:

  • Unified Customer View: Consolidated customer data from multiple sources into a single, reliable source of truth
  • Data-Driven Segmentation: ML-powered segments outperformed rule-based segments, improving campaign ROI and conversion rates
  • Faster Campaign Activation: Reduced time from segment creation to campaign launch from days to minutes
  • Cross-Channel Consistency: Same segments available across all platforms, ensuring consistent messaging and targeting
  • Scalable Infrastructure: BigQuery and Airflow provided the scalability to handle growing data volumes and increasing number of segments

Technology Stack

Google BigQuery
Apache Airflow
Google Cloud Platform
BigQuery ML
Python
SQL
Facebook Ads API
Google Ads API
Criteo API

Key Learnings

Building this CDP provided valuable insights into data engineering and marketing technology:

  • Unified Data Model: Creating a single source of truth for customer data required careful data modeling and transformation logic to handle inconsistencies across source systems
  • ML in Production: Deploying ML models in a production CDP required robust pipelines for model training, evaluation, and deployment, with monitoring to detect model drift
  • API Integration Complexity: Each advertising platform had different APIs, data formats, and rate limits, requiring custom integration logic and error handling
  • User-Centric Design: Making the CDP accessible to non-technical marketing teams was crucial—the best data infrastructure is useless if users can't easily activate it