PaperCut — About the Project
Inspiration
Every ML researcher knows the feeling: you read a paper, you get the idea, but the gap between understanding $\hat{y} = \text{softmax}(QK^T / \sqrt{d_k}) \cdot V$ and actually running it is enormous. Papers are dense, codebases are sparse, and the official implementation, if it exists, is usually a 3000-line research repo with zero comments and a broken requirements.txt.
We wanted to close that gap in one drag-and-drop.
What We Built
PaperCut is an autonomous ML research-to-code pipeline. You drop in an ArXiv PDF, or paste an abstract URL, and within minutes you have:
- Scope validation: a fast Gemini Flash check that confirms the paper actually proposes a new model, not a survey or an application paper
- A structured research contract: Gemini Pro extracts architecture parameters, training objectives, datasets, and novelty claims into a typed JSON schema
- A PyTorch scaffold: auto-generated starter code wired to the extracted blueprint
- A CUDA optimisation blueprint: hardware-level notes on where the architecture can be accelerated
- A chat interface: ask anything about the paper, grounded in the extracted context, with token-level streaming
The pipeline is built on LangGraph, which let us model the four nodes as a typed state graph with conditional routing. If Node 0 rejects a paper, the expensive Pro call never fires:
$$ \text{cost} = \underbrace{C_{\text{Flash}}}{\text{always}} + \underbrace{C{\text{Pro}}}_{\text{PASS only}} $$
Sessions persist to Google Cloud Storage, so nothing is lost on server restart. The frontend is a React + Framer Motion SPA with a VSCode-style resizable code viewer, live SSE pipeline progress, and global drag-and-drop.
How We Built It
Backend: FastAPI with async everywhere, generate_content_async for Gemini, asyncio.to_thread for GCS and CPU-bound work, and fire-and-forget asyncio.create_task for artefact uploads so the SSE stream is not blocked. LangGraph compiles the four-node graph with a conditional edge after Node 0:
$$ \text{route}(s) = \begin{cases} \text{node}_1 & \text{if } s.\texttt{scope_valid} = \top \ \text{END} & \text{otherwise} \end{cases} $$
Gemini integration: All calls go through the Vertex AI SDK, vertexai, not google-generativeai. Every JSON response is passed through strip_markdown_fences() before json.loads(). Gemini will happily wrap structured output in triple backticks even when you ask it not to.
Frontend: React 19 + Vite. State lives entirely in App.jsx and flows down as props, with no external state library needed at this scale. Streaming is handled by a thin SSE wrapper in api/client.js that parses data: lines and dispatches typed events. Animations use Framer Motion. Syntax highlighting uses react-syntax-highlighter.
Challenges
The scope disagreement problem. Node 0, Flash, and Node 1, Pro, would sometimes contradict each other. Flash passes a paper, then Pro decides it is out of scope and refuses to extract. The fix was a two-layer protocol: if Node 0 already passed, Node 1 is instructed to extract unconditionally, with no second-guessing. Scope authority belongs to Node 0 alone.
Structured JSON from an LLM. Getting Gemini Pro to return a strictly typed ResearchContract reliably required a tight system prompt, additionalProperties: false in the schema, fence-stripping, and retry logic. Even then, numeric fields like $d_{\text{model}}$ sometimes arrive as strings.
GCS artefact wiring. The artifact store silently used a None bucket name because the config key was GCS_BUCKET_NAME in the store but GCP_BUCKET_NAME everywhere else. Files were being "uploaded" to /None/papers/... with no error, just quiet data loss. Config key mismatches are the iceberg of distributed systems bugs.
CORS and DELETE. Starlette's allow_methods=["*"] does not include DELETE. This is documented, but only after you have spent an hour wondering why your frontend gets a 405 on every session delete.
Real-time streaming UX. SSE is straightforward on the server side. The challenge is making it feel good on the client: progressive pipeline steps with time estimates, scroll-to-bottom that does not fight the user, and a streaming assistant bubble that shows a status line, Thinking... → Reading context... → Generating..., before the first token arrives.
What We Learned
- LangGraph's typed
AgentStatemakes multi-node pipelines dramatically easier to reason about than raw async chains. The graph is the documentation. - Vertex AI's
generate_content_asyncwith SSE passthrough can sustain about 50 tokens per second to the browser with no perceptible latency. - A two-second splash screen buys enough time to load sessions from GCS on startup without the user ever seeing an empty state.
- The right abstraction boundary for a hackathon is one clean state machine, zero ORMs, and one CSS file.
Log in or sign up for Devpost to join the conversation.