The Story Behind FraudGuard AI
Solving the unsolvable class imbalance with Generative AI
What Inspired Us
Fraud detection has a cruel irony. Banks lose billions of dollars every year, yet fraud itself is rare, often less than 0.5% of all transactions. This extreme class imbalance creates a paradox: models trained on real-world data become statistically correct but practically useless.
A classifier predicting “legitimate” for every transaction achieves an accuracy of about 99.5%. However, its fraud recall is roughly 0%. That’s when it clicked. The problem wasn’t weak models; it was weak data. We weren’t failing at fraud detection. We were failing to represent fraud itself.
What We Learned
This project completely reshaped how we think about machine learning in production systems.
Key Learnings:
- Accuracy is a trap in imbalanced datasets.
- Recall matters more than comfort metrics.
- Oversampling techniques like SMOTE assume linearity.
- Financial behavior is non-linear, correlated, and conditional.
Most importantly, we learned that if the data doesn’t contain enough fraud, you don’t just detect fraud; you simulate it.
How We Built FraudGuard AI
Instead of creating just another classifier, we designed a two-stage intelligence system.
- Generative Intelligence Layer — CTGAN
We used CTGAN (Conditional Tabular GAN) to learn the true distribution of fraudulent behavior.
Unlike traditional oversampling, CTGAN captures joint probability distributions and preserves correlations between transaction amount, time of transaction, and behavioral features. It generates privacy-safe, hyper-realistic synthetic fraud data. Mathematically, instead of interpolating, we model fraud behavior.
- Detection & Trust Layer
On top of the augmented dataset, we trained a high-performance classifier and wrapped it in trust.
- Streamlit Dashboard for live interaction
- Model Comparison (Baseline vs Augmented)
- SHAP Explainability for every prediction
Every blocked transaction answers the question: “Why was this flagged?” rather than “Trust me.”
The Results
Model | Fraud Recall
Baseline (Imbalanced Data) | ~10%
CTGAN-Augmented Model | 70%+
That means we achieved seven times better detection. In simpler terms, we didn’t just catch a few fraudsters; we caught the entire gang.
Challenges We Faced
Of course, the road wasn’t entirely smooth.
Major Challenges:
- Training GANs on tabular data without mode collapse.
- Ensuring synthetic data didn’t leak sensitive patterns.
- Balancing realism and diversity in generated samples.
- Explaining deep-model decisions to non-technical stakeholders.
Each challenge pushed us to think beyond models and focus on real-world deployment.
Why This Matters
For a mid-sized bank processing 10,000 transactions per day, improving recall from 10% to 70% translates to millions saved annually, stronger fraud prevention, explainable and regulator-friendly AI, and a system ready for real-time payment processes.
FraudGuard AI is:
- Production-ready
- Explainable
- Privacy-safe
- Future-proof
Final Thought
We didn’t try to outsmart fraudsters; we out-imagined them. By using Generative AI, we trained our system not just on history but on possibility. FraudGuard AI doesn’t just detect fraud; it simulates it to defeat it. Thank you.
Log in or sign up for Devpost to join the conversation.