Just Brim It

AI-powered agent swarm expense intelligence for SMBs. Built for Brim Financial x MPC Hacks on real company-card data and the real Brim expense policy. Consistent with Brim Financial's brand and theme.


Inspiration

The Brim challenge: SMBs generate thousands of card transactions a month but have no way to understand their own spending. Brim wanted to change that, and asked us to "make the data talk."

Concretely, the brief asked for four capabilities: talk to your data, digitize & enforce policy, run pre-approvals, and generate expense reports

We built all four end-to-end on the real data, then extended well past them.

Two things shaped our take. First, we wanted a finance manager (not an analyst) to get answers, catch policy breaches, approve spend, and file reports without touching a spreadsheet. Second, we built around the dimensions the data (card as cost-center, category, merchant, jurisdiction, time). Our take was that the product is trustworthy because it never invents data.

What it does

  1. Talk to their data - ask in plain NLP ("top categories by spend", "USA vs Canada by month") and get the right chart plus a one-line answer, with multi-N tree follow-ups that reuse context.
  2. Digitize and enforce policy - six rules from the real Brim policy auto-scan every transaction, catch split-charge evasion (two sub-threshold charges that sum over the limit), roll up repeat offenders, and apply contextual AI severity (a legit same-day permit batch is LOW; genuine threshold-ducking is HIGH). Context presets (Quarter-end / Audit / High-travel) scale every rule's threshold before scanning, so the engine adapts to the situation.
  3. Run an AI pre-approval queue - each request shows the card's history, category budget headroom, and policy flags, plus an AI approve/deny/review recommendation with reasoning.
  4. Generate CFO-ready expense reports - transactions auto-group by jurisdiction + month with a category breakdown, policy-flag count, and an AI-written summary to export into PDF.

On top of the four required features , we also added:

anomaly & fraud detection, vendor-consolidation savings, probabilistic Monte Carlo forecasting (p10/p50/p90 + budget-overrun probability with an interactive what-if slider), per-category/card volatility scores, receipt matching (AI Vision OCR), per-category budgets, recurring-spend detection, cross-border FX exposure, spend profile Supermemory knowledge-graph stores each finished Q&A and semantically recalls the relevant past turns before answering.

Two things set it apart: a multi-agent reasoning layer (a Python LangGraph swarm) and on-chain tamper-proof audit anchoring (Solana). Critical compliance alerts can even place an interactive phone call (ElevenLabs + Twilio).

How we built it 🛠️

Deterministic core, AI augmentation. Every number comes from parameterized SQL over the real data; the AI only interprets, judges, and narrates, and never has DB access or writes SQL. That makes the output trustworthy and makes AI failures non-fatal.

  • Data spine: a shared ingest pipeline (lib/ingest.ts) normalizes the xlsx and any uploaded CSV/XLSX into a wide transactions table (MCC-derived categories, normalized merchants, signed CAD, cross-border/round-number flags), so the read path is pure aggregation. One query layer (lib/queries.ts) backs both the dashboard and the AI tools, so they can never disagree.

  • AI layer (Google Gemini, @google/genai): a function-calling agent loop for chat (5 whitelisted, zod-validated query tools, auto-viz, lineage trail) plus bounded single-call JSON passes for severity triage, approval recs, and report summaries, all behind a per-model 429/404 fallback chain.

  • Multi-agent sidecar (Python): FastAPI + LangGraph + langchain (Gemini and OpenAI) + Pydantic. Four swarms: approval debate (Prosecutor and Defender in parallel, then a Judge), per-suspect fraud investigators, a compliance reviewer + false-positive challenger, and an insights multi-lens sweep (4 lenses, then a ranker). Stateless: the TS routes gather context, call it, and persist results + per-agent traces

  • On-chain audit (Solana): @solana/web3.js + @solana/spl-memo write a SHA-256 of each approved record into a devnet Memo transaction (server keypair, server-only), with a Verify action that re-hashes the live record to detect tampering.

  • Voice alerts: ElevenLabs Conversational AI over Twilio for critical alerts.

  • Cross-session memory (Supermemory knowledge-graph): lib/supermemory.ts wraps Supermemory's API : after each Ask-AI turn we store the Q&A under a container tag (POST /v3/documents), and before the next question we semantically recall the most relevant past turns (POST /v3/search) and inject them into the system prompt, so the assistant carries context across sessions. It's the same knowledge-graph idea applied to a finance chat: turns become nodes, recall pulls the related ones.

Challenges we ran into 😮‍💨

  • The data did not match the brief. No employees or departments. We reframed around the real dimensions and taught the AI to explain it, rather than fake an org chart.

  • Free-tier quota cannot feed a swarm. Gemini's ~20 requests/minute free limit made the multi-agent layer time out and degrade. We added an OpenAI provider (gpt-4o-mini) to the sidecar so the swarm actually completes.

  • Keeping AI bounded and trustworthy. Whitelisted enum tools + zod validation make the chat injection-proof; bounded JSON passes keep cost predictable; graceful fallback keeps the app working with no key at all.

  • Robustness of the swarm. One bad record (a null id) once 500'd a whole batch; we added per-item isolation, safe parsing, and a 503 handler so a single failure can't take down a run.

  • Real-world friction. Solana devnet faucet IP rate-limits, and recurring Next .next build- cache corruption from running several dev servers at once.

Accomplishments that we're proud of

  • All four required features work end-to-end on the real data
  • A genuine multi-agent layer (parallel fan-out, map-reduce, fan-in via LangGraph), not a single-prompt wrapper, with live visualization of the swarm.
  • On-chain tamper-evidence for financial approvals, with a real Explorer link and a working tamper demo.
  • Honest data handling turned into a feature: the AI surfaces what dimensions exist instead of pretending.
  • Graceful degradation at every layer - no sidecar, no Gemini key, no OpenAI key, no Solana wallet: each feature falls back to a deterministic baseline and never throws into a user flow.

What we learned 🤓

  • Deterministic core + AI augmentation is the pattern that makes AI output trustworthy and its failures survivable.
  • Multi-agent reasoning needs real quota - the architecture was right, but the free tier could not feed it
  • LangGraph is very efficient when you actually need parallel fan-out and joins; below that, a single bounded call is simpler
  • Plan for failure first - per-item isolation and fallback chains helped us finish in time

What's next for Brim It 💥

  • A live bank feed instead of file uploads, and a richer policy editor (author new rules from the UI, scoped to cards/categories/people).
  • Production keys (billing-enabled Gemini/OpenAI) and a deployed sidecar so the swarm runs in production, plus an optional mainnet anchoring tier.
  • More agent lenses (tax/IFTA, duplicate-vendor negotiation) and receipt auto-capture from email.

Technologies & libraries 💻📚

Core app: Next.js ,React 18, TypeScript, Node. Data: SQLite, zod for validation. AI: Google Gemini (@google/genai, function-calling + JSON passes, per-model fallback); OpenAI (gpt-4o-mini, via the sidecar). Multi-agent sidecar (Python): FastAPI, Uvicorn, LangGraph, langchain-google-genai, langchain-openai, Pydantic, numpy (Monte Carlo forecast simulation); uv for env/deps; pytest. Memory: Supermemory knowledge-graph store + semantic recall Blockchain: Solana @solana/web3.js + @solana/spl-memo Voice: ElevenLabs Conversational AI + Twilio. ool/runtime,three`,


Built With

Share this project:

Updates