Simulate AppStore/Google Play data pipeline

In this project I tried to use differet tools to simulate an AppStore full data pipeline. The journay the data takes, and how different technologies help at different steps. My main goal is to showcase a variety of concepts and tools I have experience with.

Hover over the elements in the overview to see a short desription, and click the tabs in the components section to explore each stage in detail.

Data Science Data Engineering Machine Learning Google Cloud Platform (GCP) Real-Time Streaming MLOps BigQuery Vertex AI Non-homogeneous Poisson process

Link to GitHub repository

Architecture Overview

Data Generation

Python script to synthesize user interactions

Data Ingestion

GCP Pub/Sub (alt. Kafka)

Processing

GCP Dataflow, Apache Beam (alt. Spark/Flink)

Warehouse

BigQuery, Google Cloud Storage+Parquet

Warehouse

BigQuery, Google Cloud Storage+Parquet

BI Dashboards

Metabase (alt. Power BI, Tableau)

Warehouse

BigQuery, Google Cloud Storage+Parquet

ML Tasks

BigQueryML and Vertex AI

Components Description

Explore Stages

Select a tab to see a detailed explanation here.