FraudGuardAI | Devpost

The Story Behind FraudGuard AI

Solving the unsolvable class imbalance with Generative AI

What Inspired Us

Fraud detection has a cruel irony. Banks lose billions of dollars every year, yet fraud itself is rare, often less than 0.5% of all transactions. This extreme class imbalance creates a paradox: models trained on real-world data become statistically correct but practically useless.

A classifier predicting “legitimate” for every transaction achieves an accuracy of about 99.5%. However, its fraud recall is roughly 0%. That’s when it clicked. The problem wasn’t weak models; it was weak data. We weren’t failing at fraud detection. We were failing to represent fraud itself.

What We Learned

This project completely reshaped how we think about machine learning in production systems.

Key Learnings:

Accuracy is a trap in imbalanced datasets.
Recall matters more than comfort metrics.
Oversampling techniques like SMOTE assume linearity.
Financial behavior is non-linear, correlated, and conditional.

Most importantly, we learned that if the data doesn’t contain enough fraud, you don’t just detect fraud; you simulate it.

How We Built FraudGuard AI

Instead of creating just another classifier, we designed a two-stage intelligence system.

Generative Intelligence Layer — CTGAN

We used CTGAN (Conditional Tabular GAN) to learn the true distribution of fraudulent behavior.

Unlike traditional oversampling, CTGAN captures joint probability distributions and preserves correlations between transaction amount, time of transaction, and behavioral features. It generates privacy-safe, hyper-realistic synthetic fraud data. Mathematically, instead of interpolating, we model fraud behavior.

Detection & Trust Layer

On top of the augmented dataset, we trained a high-performance classifier and wrapped it in trust.

Streamlit Dashboard for live interaction
Model Comparison (Baseline vs Augmented)
SHAP Explainability for every prediction

Every blocked transaction answers the question: “Why was this flagged?” rather than “Trust me.”

The Results

Model | Fraud Recall
Baseline (Imbalanced Data) | ~10%
CTGAN-Augmented Model | 70%+

That means we achieved seven times better detection. In simpler terms, we didn’t just catch a few fraudsters; we caught the entire gang.

Challenges We Faced

Of course, the road wasn’t entirely smooth.

Major Challenges:

Training GANs on tabular data without mode collapse.
Ensuring synthetic data didn’t leak sensitive patterns.
Balancing realism and diversity in generated samples.
Explaining deep-model decisions to non-technical stakeholders.

Each challenge pushed us to think beyond models and focus on real-world deployment.

Why This Matters

For a mid-sized bank processing 10,000 transactions per day, improving recall from 10% to 70% translates to millions saved annually, stronger fraud prevention, explainable and regulator-friendly AI, and a system ready for real-time payment processes.

FraudGuard AI is:

Production-ready
Explainable
Privacy-safe
Future-proof

Final Thought

We didn’t try to outsmart fraudsters; we out-imagined them. By using Generative AI, we trained our system not just on history but on possibility. FraudGuard AI doesn’t just detect fraud; it simulates it to defeat it. Thank you.

Built With

ctgan-(conditional-tabular-gan)
docker
git
github
matplotlib
numpy
pandas
plotly
python
random-forest
scikit-learn
shap
streamlit
xgboost

Updates

Suryank Malik started this project — Jan 04, 2026 03:51 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.