- Live Demo: https://synth-lab-203573749453.us-central1.run.app./
- Video Link: https://youtu.be/_lGVlJ6S7NQ
- Page: https://sanial.github.io/proj_synth_lab.html
A full-stack TypeScript application that transforms arXiv research papers into interactive visual artifacts. Instead of producing text summaries, Synth Lab turns paper content into architecture diagrams, keyword maps, and cross-paper conceptual comparisons. Built as a solo entry for the Google Gemini Agent Competition in 10 days.
Academic research papers are dense and static. Most AI tools summarize them as walls of text that lose the underlying architecture and methodology structure. Readers end up manually tracing through complex systems buried in prose, spending hours building mental models that an AI could generate in seconds.
The backend is an intentionally thin Express server hosted on Google Cloud Run. It acts as a secure gateway between the frontend and external services, keeping the Gemini API key off the client, proxying arXiv searches and PDF fetches, and shaping prompts before sending them to Gemini. The frontend is a React and Vite application that owns all user orchestration, session state, and rendering. The core data flow is: arXiv search → backend normalizes the Atom feed into paper objects → selected papers sent to Gemini for analysis → structured JSON returned → React renders across four tabs. The backend makes three distinct Gemini calls. Conceptual Dive runs a two-pass pipeline where the first call analyzes paper content and the second converts that analysis into Mermaid diagram JSON. Audio narration for the Research Agent tab is handled by a separate Gemini TTS call.
Technical Synthesis turns one or more papers into a Mermaid flowchart and a final architecture diagram. Users can trigger sub-diagrams to drill deeper into specific technical modules. Research Agent provides deeper written analysis of the paper with optional audio narration. Deep Dive parses the paper PDF with pdf.js and builds a D3 bubble map from keyword frequency, giving a visual sense of thematic weight across the document. Conceptual Dive runs the two-pass Gemini pipeline across multiple papers and renders cross-paper comparative diagrams.
Conceptual Dive is slow because Gemini analyzes paper content and generates diagrams in the same sequential pipeline. Splitting these into separate cached stages would fix this. PDF fetches are never cached. Every request re-downloads the same files from arXiv. A persistent storage layer would eliminate this on repeat requests. Mermaid diagram generation breaks on complex papers with no retry or fallback logic. Syntax errors surface directly to the user. Schema enforcement and a fallback rendering path would fix this. The streaming UI is simulated. The backend returns a full Gemini response and the client incrementally renders it. Real server-sent event streaming is the next step.
Splitting the Conceptual Dive pipeline into cached stages to reduce latency. Adding a persistent database so users can save and return to diagrams and searches across sessions. Implementing real SSE streaming from Gemini through the backend. Adding structured output validation for Mermaid generation with fallback parsing.
Prerequisites: Node.js
-
Install dependencies:
npm install -
Set the
GEMINI_API_KEYin .env.local to your Gemini API key -
Run the app:
npm run dev
https://github.com/user-attachments/assets/e1352277-25a7-4407-aadf-a30fad279346 https://github.com/user-attachments/assets/48034d6a-b350-4b38-ae0b-8d2d787608c1