How It's Built

This dashboard is the final surface of a six-project data engineering platform built entirely on Google Cloud. Raw CSV files go in at one end; interactive analytics come out at the other. Every layer in between (ingestion, transformation, quality checks, orchestration, and deployment) is version-controlled, CI/CD-tested, and documented with decision rationale.

End-to-End Pipeline

CSV / Parquet Source Data BigQuery Raw Tables Dataform stg / int / marts Quality Assertions FastAPI This Dashboard Cloud Run Public URL GitHub Actions CI/CD: lint, test, build, deploy on every push to main

What Was Built

P01
Claims Data Warehouse

Synthetic insurance data (policyholders, policies, claims, payments, coverages) loaded into BigQuery. Dataform manages 5 staging tables, 3 intermediate transforms, 6 dimension/fact tables, and 2 report views. 16 data quality assertions run on every deploy.

P02
Orchestrated ELT Pipeline

A Cloud Run container runs the full extract-load-transform cycle on a schedule. Cloud Run + Cloud Scheduler was chosen over Cloud Composer: same reliability at a fraction of the cost for a single-DAG workload.

P03
Streaming Intake

Pub/Sub topic receives claim events in real time. A Cloud Run subscriber validates, deduplicates, and writes to BigQuery. Demonstrates event-driven architecture alongside the batch ELT pattern.

P04
Data Quality Framework

Dataform assertions enforce referential integrity, uniqueness, and row-level conditions. If a quality check fails, the pipeline stops before bad data reaches the analytics layer.

P05
Cost Governance

BigQuery slot reservations, budget alerts, and resource labeling. The whole platform runs on GCP's free tier and pay-per-query pricing, with a total monthly cost under $5 USD.

P06
Pricing ML Model

A GLM (Generalized Linear Model) scores 144K policies with a predicted pure premium. Output lands in dev_pricing_ml.model_scoring and feeds the Pricing Adequacy page of this dashboard.

Key Technical Decisions

  • Cloud Run over Cloud Composer — Airflow is overkill for a single pipeline. Cloud Run + Scheduler costs ~$0 vs $300+/mo for Composer.
  • Dataform over dbt — Native BigQuery integration, no additional infrastructure, free tier covers this workload.
  • FastAPI over Streamlit — Full control over HTML, CSS, and performance. Streamlit's default look signals "prototype" in a portfolio.
  • GCS + BigQuery over Postgres — The project demonstrates cloud-native data engineering, not application-database patterns.