Inspiration
Customer churn is a critical problem for businesses, as retaining existing customers is often more cost-effective than acquiring new ones. We wanted to build a simple yet practical system that can help identify customers who are likely to leave, using data-driven insights. The goal was to simulate a real-world analytics pipeline that could later be extended into an AI-powered decision system.
What it does
Artemis is a customer churn analysis pipeline that processes raw customer data and classifies users into different churn risk categories — High, Medium, and Low.
It helps answer key business questions such as:
- Which customers are at risk of leaving?
- What patterns indicate churn?
- How customer activity and spending relate to churn risk
The system transforms raw data into structured insights that can be easily used for decision-making.
How we built it
We built Artemis using Databricks and Apache Spark, following a Medallion Architecture approach:
- Bronze Layer: Ingested raw customer data from CSV into Delta tables
- Silver Layer: Cleaned the data, handled null values, and performed feature engineering
- Gold Layer: Applied business logic to classify customers into churn risk categories
We used PySpark for transformations and Delta Lake for efficient storage and querying. The pipeline is designed to be scalable and production-ready.
Challenges we ran into
- Designing meaningful churn logic with limited features
- Ensuring clean and consistent data across transformations
- Structuring the pipeline to follow industry-standard architecture within limited time
- Balancing simplicity (for quick build) with real-world relevance
Accomplishments that we're proud of
- Successfully built an end-to-end data pipeline in a short time
- Implemented Medallion Architecture (Bronze → Silver → Gold)
- Created a clear and explainable churn classification system
- Made the project scalable and ready for AI integration
What we learned
- Hands-on experience with Databricks and PySpark workflows
- Importance of data cleaning and feature engineering in analytics
- How to structure data pipelines using Delta Lake
- How business logic translates into data-driven insights
What's next for Artemis
- Integrate Databricks AI Agents for natural language querying
- Add Genie to enable NL → SQL conversion
- Upgrade from rule-based logic to machine learning models
- Build interactive dashboards for business users
- Add real-time data streaming for live churn monitoring
The vision is to evolve Artemis into a fully AI-powered customer intelligence platform.
Log in or sign up for Devpost to join the conversation.