Skip to content

sanial/synth-lab

Repository files navigation

Links:

Screenshot 2026-03-16 190726 Screenshot 2026-03-16 190748 Screenshot 2026-03-16 190807 Screenshot 2026-03-16 190846 Screenshot 2026-03-17 174811 synth-lab4 (1) synth-lab4 (4) synth-lab4 (3) synth-lab4 (2) synth-lab4 (0) synth-lab5

A full-stack TypeScript application that transforms arXiv research papers into interactive visual artifacts. Instead of producing text summaries, Synth Lab turns paper content into architecture diagrams, keyword maps, and cross-paper conceptual comparisons. Built as a solo entry for the Google Gemini Agent Competition in 10 days.

Problem

Academic research papers are dense and static. Most AI tools summarize them as walls of text that lose the underlying architecture and methodology structure. Readers end up manually tracing through complex systems buried in prose, spending hours building mental models that an AI could generate in seconds.

Architecture

The backend is an intentionally thin Express server hosted on Google Cloud Run. It acts as a secure gateway between the frontend and external services, keeping the Gemini API key off the client, proxying arXiv searches and PDF fetches, and shaping prompts before sending them to Gemini. The frontend is a React and Vite application that owns all user orchestration, session state, and rendering. The core data flow is: arXiv search → backend normalizes the Atom feed into paper objects → selected papers sent to Gemini for analysis → structured JSON returned → React renders across four tabs. The backend makes three distinct Gemini calls. Conceptual Dive runs a two-pass pipeline where the first call analyzes paper content and the second converts that analysis into Mermaid diagram JSON. Audio narration for the Research Agent tab is handled by a separate Gemini TTS call.

The Four Tabs

Technical Synthesis turns one or more papers into a Mermaid flowchart and a final architecture diagram. Users can trigger sub-diagrams to drill deeper into specific technical modules. Research Agent provides deeper written analysis of the paper with optional audio narration. Deep Dive parses the paper PDF with pdf.js and builds a D3 bubble map from keyword frequency, giving a visual sense of thematic weight across the document. Conceptual Dive runs the two-pass Gemini pipeline across multiple papers and renders cross-paper comparative diagrams.

Known Limitations

Conceptual Dive is slow because Gemini analyzes paper content and generates diagrams in the same sequential pipeline. Splitting these into separate cached stages would fix this. PDF fetches are never cached. Every request re-downloads the same files from arXiv. A persistent storage layer would eliminate this on repeat requests. Mermaid diagram generation breaks on complex papers with no retry or fallback logic. Syntax errors surface directly to the user. Schema enforcement and a fallback rendering path would fix this. The streaming UI is simulated. The backend returns a full Gemini response and the client incrementally renders it. Real server-sent event streaming is the next step.

What's Next

Splitting the Conceptual Dive pipeline into cached stages to reduce latency. Adding a persistent database so users can save and return to diagrams and searches across sessions. Implementing real SSE streaming from Gemini through the backend. Adding structured output validation for Mermaid generation with fallback parsing.

Run Locally

Prerequisites: Node.js

  1. Install dependencies: npm install

  2. Set the GEMINI_API_KEY in .env.local to your Gemini API key

  3. Run the app: npm run dev

Screenshot 2026-03-16 195825

GCloud Proof videos

https://github.com/user-attachments/assets/e1352277-25a7-4407-aadf-a30fad279346 https://github.com/user-attachments/assets/48034d6a-b350-4b38-ae0b-8d2d787608c1

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages