SpendGuard AI

Inspiration

At MPC Hacks I competed in the general track, but something Abe said in his closing remarks at the ceremony stuck with me: building a real project for a company beats sending three hundred applications. So after the event I took Brim's public challenge sample data and built this on my own.

The problem is simple. A raw company card statement is thousands of rows of noise, and a finance manager can't act on it directly. I wanted to turn it into something usable, with one rule at the center: every number shown has to be computed from the data, never invented by an AI.

What it does

SpendGuard AI has two features, both built on Brim's real challenge data.

The first is talk to your spend data. You ask questions in plain English, like "what was our total spend?" or "show spend by month," and the tool answers with the right visualization for the question. Before anything else, it separates genuine merchant charges from account noise like payments, interest, fees, and redemptions, so totals aren't inflated by bank movements.

The second is a policy reality check. It takes Brim's written expense policy and measures it against what the statement actually shows. The policy says anything over fifty dollars needs manager pre-authorization; measured against the data, about two-thirds of charges cross that line, which is a real gap between the written control and operations. A slider lets a manager simulate a different threshold and watch the review pile change in real time. A review queue surfaces duplicate and same-day charge candidates while down-ranking routine permit and toll charges, the normal cost of running a trucking fleet, so genuine outliers rise to the top instead of a wall of permits.

The truth rule

The part I'm most proud of is the guard. Every figure in the app is produced by a deterministic engine. When the AI writes a sentence to describe a result, that sentence passes through a guard that pulls every number out of it and checks each one against the computed value. If all of them verify, you see the AI wording. If even one can't be verified, the guard rejects it and shows the verified computed template instead. The fallback isn't a failure; it's the guarantee. You are structurally protected from ever seeing a number the system hasn't checked against the source data.

The same discipline runs through the rest of the tool. I only measure the one policy rule the data can actually support, and mark the others, like receipts and alcohol and tips, as not measurable rather than inventing a compliance score. And the system never calls anything fraud. The data has no employee, card, or timestamp identity, so it can't prove intent. These are candidates for a person to review, not accusations.

How I built it

Next.js and TypeScript on the front end, with the dataset queried in-browser through sql.js so every figure is computed client-side from the actual statement. The deterministic engine handles all numbers and visualization selection. The AI layer, running on Claude, only ever narrates a result the engine already produced, and only after the guard verifies it. Deployed on Vercel. The public deployment runs in template mode with the same computed numbers; the AI narration layer is enabled separately.

Challenges I ran into

The hardest part was resisting the easy version, where you let the model both compute and describe. The whole value here is that it can't. Designing the guard to parse numbers out of free-form narration and verify each one against the engine, then fail safe to the computed template, took the most iteration. Keeping the policy analysis honest was the other challenge: it would have been easy to score every rule, but most of them aren't measurable from this dataset, and saying so plainly is more useful than a fake number.

Accomplishments that I'm proud of

A system that separates signal from noise, turns a written policy into a live operational check, and structurally cannot show a figure it can't prove. And I built it independently, after the event, on real challenge data, because I'd rather show a working product than send another application.

What I learned

That the interesting engineering problem in an AI product is often the boundary: deciding exactly what the model is and isn't allowed to do, and enforcing it in code rather than trusting a prompt. Verification beats persuasion.

What's next for SpendGuard AI

Widening the set of measurable policy rules as more structured fields become available, adding export for flagged review candidates, and generalizing the guard so it can wrap any computed result, not just the current question set.

A note on the demo

The walkthrough video uses an AI-generated voice (ElevenLabs) for narration. This is a production choice for the demo only. The product itself uses no synthetic media, and as described above, it never presents an AI-generated number as fact.

Share this project:

Updates