Artemis | Devpost

Inspiration

Customer churn is a critical problem for businesses, as retaining existing customers is often more cost-effective than acquiring new ones. We wanted to build a simple yet practical system that can help identify customers who are likely to leave, using data-driven insights. The goal was to simulate a real-world analytics pipeline that could later be extended into an AI-powered decision system.

What it does

Artemis is a customer churn analysis pipeline that processes raw customer data and classifies users into different churn risk categories — High, Medium, and Low.

It helps answer key business questions such as:

Which customers are at risk of leaving?
What patterns indicate churn?
How customer activity and spending relate to churn risk

The system transforms raw data into structured insights that can be easily used for decision-making.

How we built it

We built Artemis using Databricks and Apache Spark, following a Medallion Architecture approach:

Bronze Layer: Ingested raw customer data from CSV into Delta tables
Silver Layer: Cleaned the data, handled null values, and performed feature engineering
Gold Layer: Applied business logic to classify customers into churn risk categories

We used PySpark for transformations and Delta Lake for efficient storage and querying. The pipeline is designed to be scalable and production-ready.

Challenges we ran into

Designing meaningful churn logic with limited features
Ensuring clean and consistent data across transformations
Structuring the pipeline to follow industry-standard architecture within limited time
Balancing simplicity (for quick build) with real-world relevance