Inspiration

Customer churn is a critical problem for businesses, as retaining existing customers is often more cost-effective than acquiring new ones. We wanted to build a simple yet practical system that can help identify customers who are likely to leave, using data-driven insights. The goal was to simulate a real-world analytics pipeline that could later be extended into an AI-powered decision system.


What it does

Artemis is a customer churn analysis pipeline that processes raw customer data and classifies users into different churn risk categories — High, Medium, and Low.

It helps answer key business questions such as:

  • Which customers are at risk of leaving?
  • What patterns indicate churn?
  • How customer activity and spending relate to churn risk

The system transforms raw data into structured insights that can be easily used for decision-making.


How we built it

We built Artemis using Databricks and Apache Spark, following a Medallion Architecture approach:

  • Bronze Layer: Ingested raw customer data from CSV into Delta tables
  • Silver Layer: Cleaned the data, handled null values, and performed feature engineering
  • Gold Layer: Applied business logic to classify customers into churn risk categories

We used PySpark for transformations and Delta Lake for efficient storage and querying. The pipeline is designed to be scalable and production-ready.


Challenges we ran into

  • Designing meaningful churn logic with limited features
  • Ensuring clean and consistent data across transformations
  • Structuring the pipeline to follow industry-standard architecture within limited time
  • Balancing simplicity (for quick build) with real-world relevance

Accomplishments that we're proud of

  • Successfully built an end-to-end data pipeline in a short time
  • Implemented Medallion Architecture (Bronze → Silver → Gold)
  • Created a clear and explainable churn classification system
  • Made the project scalable and ready for AI integration

What we learned

  • Hands-on experience with Databricks and PySpark workflows
  • Importance of data cleaning and feature engineering in analytics
  • How to structure data pipelines using Delta Lake
  • How business logic translates into data-driven insights

What's next for Artemis

  • Integrate Databricks AI Agents for natural language querying
  • Add Genie to enable NL → SQL conversion
  • Upgrade from rule-based logic to machine learning models
  • Build interactive dashboards for business users
  • Add real-time data streaming for live churn monitoring

The vision is to evolve Artemis into a fully AI-powered customer intelligence platform.

Built With

Share this project:

Updates