PaperCut — About the Project

Inspiration

Every ML researcher knows the feeling: you read a paper, you get the idea, but the gap between understanding $\hat{y} = \text{softmax}(QK^T / \sqrt{d_k}) \cdot V$ and actually running it is enormous. Papers are dense, codebases are sparse, and the official implementation, if it exists, is usually a 3000-line research repo with zero comments and a broken requirements.txt.

We wanted to close that gap in one drag-and-drop.

What We Built

PaperCut is an autonomous ML research-to-code pipeline. You drop in an ArXiv PDF, or paste an abstract URL, and within minutes you have:

Scope validation: a fast Gemini Flash check that confirms the paper actually proposes a new model, not a survey or an application paper
A structured research contract: Gemini Pro extracts architecture parameters, training objectives, datasets, and novelty claims into a typed JSON schema
A PyTorch scaffold: auto-generated starter code wired to the extracted blueprint
A CUDA optimisation blueprint: hardware-level notes on where the architecture can be accelerated
A chat interface: ask anything about the paper, grounded in the extracted context, with token-level streaming

The pipeline is built on LangGraph, which let us model the four nodes as a typed state graph with conditional routing. If Node 0 rejects a paper, the expensive Pro call never fires:

$$ \text{cost} = \underbrace{C_{\text{Flash}}}{\text{always}} + \underbrace{C{\text{Pro}}}_{\text{PASS only}} $$

Sessions persist to Google Cloud Storage, so nothing is lost on server restart. The frontend is a React + Framer Motion SPA with a VSCode-style resizable code viewer, live SSE pipeline progress, and global drag-and-drop.

How We Built It

Backend: FastAPI with async everywhere, generate_content_async for Gemini, asyncio.to_thread for GCS and CPU-bound work, and fire-and-forget asyncio.create_task for artefact uploads so the SSE stream is not blocked. LangGraph compiles the four-node graph with a conditional edge after Node 0:

$$ \text{route}(s) = \begin{cases} \text{node}_1 & \text{if } s.\texttt{scope_valid} = \top \ \text{END} & \text{otherwise} \end{cases} $$

Gemini integration: All calls go through the Vertex AI SDK, vertexai, not google-generativeai. Every JSON response is passed through strip_markdown_fences() before json.loads(). Gemini will happily wrap structured output in triple backticks even when you ask it not to.

Frontend: React 19 + Vite. State lives entirely in App.jsx and flows down as props, with no external state library needed at this scale. Streaming is handled by a thin SSE wrapper in api/client.js that parses data: lines and dispatches typed events. Animations use Framer Motion. Syntax highlighting uses react-syntax-highlighter.

Challenges

The scope disagreement problem. Node 0, Flash, and Node 1, Pro, would sometimes contradict each other. Flash passes a paper, then Pro decides it is out of scope and refuses to extract. The fix was a two-layer protocol: if Node 0 already passed, Node 1 is instructed to extract unconditionally, with no second-guessing. Scope authority belongs to Node 0 alone.

Structured JSON from an LLM. Getting Gemini Pro to return a strictly typed ResearchContract reliably required a tight system prompt, additionalProperties: false in the schema, fence-stripping, and retry logic. Even then, numeric fields like $d_{\text{model}}$ sometimes arrive as strings.

GCS artefact wiring. The artifact store silently used a None bucket name because the config key was GCS_BUCKET_NAME in the store but GCP_BUCKET_NAME everywhere else. Files were being "uploaded" to /None/papers/... with no error, just quiet data loss. Config key mismatches are the iceberg of distributed systems bugs.

CORS and DELETE. Starlette's allow_methods=["*"] does not include DELETE. This is documented, but only after you have spent an hour wondering why your frontend gets a 405 on every session delete.

Real-time streaming UX. SSE is straightforward on the server side. The challenge is making it feel good on the client: progressive pipeline steps with time estimates, scroll-to-bottom that does not fight the user, and a streaming assistant bubble that shows a status line, Thinking... → Reading context... → Generating..., before the first token arrives.

What We Learned

LangGraph's typed AgentState makes multi-node pipelines dramatically easier to reason about than raw async chains. The graph is the documentation.
Vertex AI's generate_content_async with SSE passthrough can sustain about 50 tokens per second to the browser with no perceptible latency.
A two-second splash screen buys enough time to load sessions from GCS on startup without the user ever seeing an empty state.
The right abstraction boundary for a hackathon is one clean state machine, zero ORMs, and one CSS file.

Built With

fastapi
gcp
gemini
langgraph
nemotron
nvidia
python

Submitted to

Innovation Hacks 2.0 - 2026

Created by

I owned the ML intelligence layer of ArXiv Agent. I built Node 2 — a PyTorch code generation agent using Qwen2.5-Coder 32B that consumes a structured architectural blueprint extracted from a research paper and generates a validated PyTorch project scaffold, with a self-correcting tensor dimension checker that automatically retries on mathematical inconsistencies. I built Node 3 — a hardware analysis agent using Nemotron Super 49B that identifies computational bottlenecks in the generated code and produces annotated CUDA C++ hardware blueprints explaining the optimization rationale. I also implemented the NVIDIA hosted inference client routing both models through NVIDIA's API, and wired my slice into the team's LangGraph pipeline. The system was validated end-to-end on multiple real ArXiv papers including Attention Is All You Need and BERT, generalizing correctly across different transformer architectures.

vigneshkumar venugopal
I owned the integration layer that turns the multi-agent system into a usable product. I built a lightweight FastAPI service that exposes a single /run endpoint to trigger the full LangGraph pipeline, while streaming real-time status updates so the system’s reasoning can be observed externally. I also managed the demo infrastructure, including precomputing and caching outputs for a reliable fallback, and creating a terminal-based runner that clearly shows each node executing step-by-step. In addition, I handled environment setup and GCP credential management across machines, ensuring the system runs consistently. My role is to connect all components end-to-end, validate that the pipeline works from PDF input to CUDA blueprint output, and make sure the demo is stable, observable, and ready under real-world conditions.

HKSK-create
Barath Suresh

Updates

Barath Suresh started this project — Apr 05, 2026 10:56 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.