Just Brim It
AI-powered agent swarm expense intelligence for SMBs. Built for Brim Financial x MPC Hacks on real company-card data and the real Brim expense policy. Consistent with Brim Financial's brand and theme.
Inspiration
The Brim challenge: SMBs generate thousands of card transactions a month but have no way to understand their own spending. Brim wanted to change that, and asked us to "make the data talk."
Concretely, the brief asked for four capabilities: talk to your data, digitize & enforce policy, run pre-approvals, and generate expense reports
We built all four end-to-end on the real data, then extended well past them.
Two things shaped our take. First, we wanted a finance manager (not an analyst) to get answers, catch policy breaches, approve spend, and file reports without touching a spreadsheet. Second, we built around the dimensions the data (card as cost-center, category, merchant, jurisdiction, time). Our take was that the product is trustworthy because it never invents data.
What it does
- Talk to their data - ask in plain NLP ("top categories by spend", "USA vs Canada by month") and get the right chart plus a one-line answer, with multi-N tree follow-ups that reuse context.
- Digitize and enforce policy - six rules from the real Brim policy auto-scan every transaction, catch split-charge evasion (two sub-threshold charges that sum over the limit), roll up repeat offenders, and apply contextual AI severity (a legit same-day permit batch is LOW; genuine threshold-ducking is HIGH). Context presets (Quarter-end / Audit / High-travel) scale every rule's threshold before scanning, so the engine adapts to the situation.
- Run an AI pre-approval queue - each request shows the card's history, category budget headroom, and policy flags, plus an AI approve/deny/review recommendation with reasoning.
- Generate CFO-ready expense reports - transactions auto-group by jurisdiction + month with a category breakdown, policy-flag count, and an AI-written summary to export into PDF.
On top of the four required features , we also added:
anomaly & fraud detection, vendor-consolidation savings, probabilistic Monte Carlo forecasting (p10/p50/p90 + budget-overrun probability with an interactive what-if slider), per-category/card volatility scores, receipt matching (AI Vision OCR), per-category budgets, recurring-spend detection, cross-border FX exposure, spend profile Supermemory knowledge-graph stores each finished Q&A and semantically recalls the relevant past turns before answering.
Two things set it apart: a multi-agent reasoning layer (a Python LangGraph swarm) and on-chain tamper-proof audit anchoring (Solana). Critical compliance alerts can even place an interactive phone call (ElevenLabs + Twilio).
How we built it 🛠️
Deterministic core, AI augmentation. Every number comes from parameterized SQL over the real data; the AI only interprets, judges, and narrates, and never has DB access or writes SQL. That makes the output trustworthy and makes AI failures non-fatal.
Data spine: a shared ingest pipeline (
lib/ingest.ts) normalizes the xlsx and any uploaded CSV/XLSX into a widetransactionstable (MCC-derived categories, normalized merchants, signed CAD, cross-border/round-number flags), so the read path is pure aggregation. One query layer (lib/queries.ts) backs both the dashboard and the AI tools, so they can never disagree.AI layer (Google Gemini,
@google/genai): a function-calling agent loop for chat (5 whitelisted, zod-validated query tools, auto-viz, lineage trail) plus bounded single-call JSON passes for severity triage, approval recs, and report summaries, all behind a per-model 429/404 fallback chain.Multi-agent sidecar (Python): FastAPI + LangGraph + langchain (Gemini and OpenAI) + Pydantic. Four swarms: approval debate (Prosecutor and Defender in parallel, then a Judge), per-suspect fraud investigators, a compliance reviewer + false-positive challenger, and an insights multi-lens sweep (4 lenses, then a ranker). Stateless: the TS routes gather context, call it, and persist results + per-agent traces
On-chain audit (Solana):
@solana/web3.js+@solana/spl-memowrite a SHA-256 of each approved record into a devnet Memo transaction (server keypair, server-only), with a Verify action that re-hashes the live record to detect tampering.Voice alerts: ElevenLabs Conversational AI over Twilio for critical alerts.
Cross-session memory (Supermemory knowledge-graph):
lib/supermemory.tswraps Supermemory's API : after each Ask-AI turn we store the Q&A under a container tag (POST /v3/documents), and before the next question we semantically recall the most relevant past turns (POST /v3/search) and inject them into the system prompt, so the assistant carries context across sessions. It's the same knowledge-graph idea applied to a finance chat: turns become nodes, recall pulls the related ones.
Challenges we ran into 😮💨
The data did not match the brief. No employees or departments. We reframed around the real dimensions and taught the AI to explain it, rather than fake an org chart.
Free-tier quota cannot feed a swarm. Gemini's ~20 requests/minute free limit made the multi-agent layer time out and degrade. We added an OpenAI provider (gpt-4o-mini) to the sidecar so the swarm actually completes.
Keeping AI bounded and trustworthy. Whitelisted enum tools + zod validation make the chat injection-proof; bounded JSON passes keep cost predictable; graceful fallback keeps the app working with no key at all.
Robustness of the swarm. One bad record (a null id) once 500'd a whole batch; we added per-item isolation, safe parsing, and a 503 handler so a single failure can't take down a run.
Real-world friction. Solana devnet faucet IP rate-limits, and recurring Next
.nextbuild- cache corruption from running several dev servers at once.
Accomplishments that we're proud of
- All four required features work end-to-end on the real data
- A genuine multi-agent layer (parallel fan-out, map-reduce, fan-in via LangGraph), not a single-prompt wrapper, with live visualization of the swarm.
- On-chain tamper-evidence for financial approvals, with a real Explorer link and a working tamper demo.
- Honest data handling turned into a feature: the AI surfaces what dimensions exist instead of pretending.
- Graceful degradation at every layer - no sidecar, no Gemini key, no OpenAI key, no Solana wallet: each feature falls back to a deterministic baseline and never throws into a user flow.
What we learned 🤓
- Deterministic core + AI augmentation is the pattern that makes AI output trustworthy and its failures survivable.
- Multi-agent reasoning needs real quota - the architecture was right, but the free tier could not feed it
- LangGraph is very efficient when you actually need parallel fan-out and joins; below that, a single bounded call is simpler
- Plan for failure first - per-item isolation and fallback chains helped us finish in time
What's next for Brim It 💥
- A live bank feed instead of file uploads, and a richer policy editor (author new rules from the UI, scoped to cards/categories/people).
- Production keys (billing-enabled Gemini/OpenAI) and a deployed sidecar so the swarm runs in production, plus an optional mainnet anchoring tier.
- More agent lenses (tax/IFTA, duplicate-vendor negotiation) and receipt auto-capture from email.
Technologies & libraries 💻📚
Core app: Next.js ,React 18, TypeScript, Node.
Data: SQLite, zod for validation.
AI: Google Gemini (@google/genai, function-calling + JSON passes, per-model fallback);
OpenAI (gpt-4o-mini, via the sidecar).
Multi-agent sidecar (Python): FastAPI, Uvicorn, LangGraph, langchain-google-genai,
langchain-openai, Pydantic, numpy (Monte Carlo forecast simulation); uv for env/deps; pytest.
Memory: Supermemory knowledge-graph store + semantic recall
Blockchain: Solana @solana/web3.js + @solana/spl-memo
Voice: ElevenLabs Conversational AI + Twilio.
ool/runtime,three`,
Built With
- next.js
- python
- react
- recharts
- tailwind
- typescript
Log in or sign up for Devpost to join the conversation.