PatchLab | Devpost

Inspiration

Game studios spend millions on playtesting but walk away with session recordings and click logs. They know a player quit at level 3. They have no idea why. We wanted to build the tool that answers the "why" by measuring what players actually feel, not just what they do.

What it does

PatchLab is a playtesting platform that captures player emotion in real time. A lightweight desktop agent records gameplay, webcam feed, and Apple Watch biometrics simultaneously. That data gets processed through a backend pipeline that fuses facial emotion scores (frustration, confusion, delight, boredom), heart rate, and HRV into a single per-second timeline.

The key feature is developer intent. Before a session starts, the developer annotates each game state with the emotion they intended players to feel. PatchLab then computes an intent delta for every second of play and generates a Playtest Health Score: green, yellow, or red per game state.

The dashboard shows the full emotional arc across sessions, where players deviated from intent, and a ranked list of the most painful moments. A natural language copilot called Sphinx lets you query the entire dataset in plain English.

For our demo we ran it on World 1-1 of Super Mario Bros. The first pit in the level produces a frustration spike well above the intended "tense but manageable" range. The verdict: FAIL. A 40-year-old design flaw, caught automatically.

How we built it

Desktop agent (Python): mss for screen capture, OpenCV, MediaPipe, and BLE for Apple Watch streaming. Chunks video into 10-second segments and uploads async to the backend.
Presage SDK: facial affect at 10 Hz, outputting frustration, confusion, delight, and boredom as 0-1 signals.
Backend (FastAPI on Vultr): receives chunks, runs Gemini Vision to extract DFA game states, resamples all streams to a unified 1 Hz timeline, computes intent delta, and generates verdicts.
Embeddings: each 10-second window is serialized into a text representation and embedded with bge-large-en, then stored in Actian VectorAI for semantic cross-session search.
Snowflake: stores raw, resampled, and fused data in a medallion architecture (Bronze / Silver / Gold).
Frontend (React + Vite + Tailwind): session management, live charts with Recharts, cross-tester comparison, and the Sphinx query interface.
Sphinx CLI Our Sphinx Client shells out to the Sphinx CLI directly in the backend, ask any natural language question, it queries Snowflake, auto-generates charts, and returns structured JSON with base64-encoded images ready for the frontend.

Challenges we ran into

Getting all five data streams (video, webcam, watch, game state, developer intent) onto the same 1 Hz timeline was harder than expected. Each source has a different sample rate and latency profile. Gemini Vision also needed careful prompt engineering to return strict JSON game state transitions reliably, especially on pixel-art NES graphics. BLE streaming from Apple Watch on Windows added another layer of pain.

Accomplishments that we're proud of

The intent delta system. The idea that a developer can pre-annotate the emotional arc they wanted and the system automatically tells you where reality diverged from that is something we hadn't seen done before in a playtesting tool. Watching it surface the World 1-1 pit as a genuine failure moment in real time was a good sign we built the right thing.

What we learned

The fusion step is where the product lives or dies. Raw emotion scores alone are noise. The moment you layer developer intent on top and compute a delta, the data becomes actionable. We also learned that WebSocket-based live streaming from a desktop agent to a cloud backend at 30 FPS is a solved problem only until you have four other streams running at the same time.

What's next for PatchLab

Packaging the desktop agent as a proper installer (Mac + Windows)
A Unity and Unreal SDK so game state transitions get logged automatically instead of inferred by Gemini
Longitudinal comparison across multiple build versions so studios can track whether a patch actually fixed the problem
Expanding Sphinx to write and execute full analytics pipelines, not just answer one-off questions