SciFetch is an autonomous AI agent designed to search, synthesize, and generate scientific literature reports based on natural language prompts.
It leverages modern AI and web technologies, LangChain for orchestration, OpenAI for summarization, and academic APIs for up-to-date content retrieval. The final output is delivered as a styled, downloadable PDF report, accessible via a clean web interface.
Try it live at: https://scifetch.vercel.app
- Introduction
- Documentation
- Key Features
- Scientific Domain Coverage
- Installation
- Usage
- Future Work
- Final Words
SciFetch is a full-stack autonomous system designed to assist researchers and professionals in exploring scientific literature efficiently through AI-powered summarization and presentation.
Born from the need to automate scientific information retrieval, SciFetch queries multiple trusted academic APIs, extracts relevant publications, and synthesizes a human-readable report in PDF format.
It combines the orchestration capabilities of LangChain, the language generation power of OpenAI models, and a web-friendly interface built with Next.js to offer:
- Fast and structured access to scientific knowledge.
- Reliable summarization of complex topics from multiple sources.
- Ready-to-use, visually styled PDF reports.
- Deployment flexibility, with both API and browser access.
The platform serves both as a research assistant and as a proof of concept for combining autonomous retrieval, modern web development, and scientific APIs into an end-to-end application.
Additional technical documentation is available in the /docs directory.
- ARCHITECTURE.md
Provides a detailed overview of the repository structure, backend and frontend components, API integrations, and the main technical building blocks of SciFetch.
This document is intended for readers who want a deeper technical view of how the system is organized beyond the high-level project overview in this README.
- Autonomous Literature Agent: Uses a retrieval pipeline with source-specific tools to query the most relevant academic APIs for each request.
- Multi-Source Scientific Retrieval: Aggregates results from PubMed, arXiv, OpenAlex, Europe PMC, and CrossRef to ensure coverage and diversity.
- LLM-Powered Summarization: Synthesizes complex, multi-source information into a cohesive and accessible summary using OpenAI models.
- Styled PDF Report Generation: Outputs are delivered as downloadable, professionally formatted PDF documents using a shared HTML report template rendered through a high-fidelity PDF workflow.
- Modern Web Interface: Built with Next.js, the frontend allows users to submit research prompts, inspect the generated report preview, and download the final PDF directly from the browser.
- Full Public Deployment: The backend is deployed on Render and the frontend on Vercel, providing instant access at https://scifetch.vercel.app.
- Graceful Failure Handling: If one API fails or returns incomplete data, the pipeline continues processing with the remaining sources.
- Secure API Usage: Requires an OpenAI API key, sent per request and handled server-side during execution.
- Tested for Robustness: Includes a high-coverage test suite using Pytest (89%) to ensure system reliability and future extensibility.
SciFetch integrates multiple academic APIs, each specializing in different scientific domains. Understanding the scope of each can help formulate more effective prompts.
| API | Covered Domains (Examples) |
|---|---|
| arXiv | Hard Sciences & CS: Artificial Intelligence, Physics, Mathematics, Computer Vision, Quantitative Finance |
| CrossRef | General Metadata: Scientific articles from all disciplines including Arts, Law, and Engineering |
| EuropePMC | Biomedical (Europe-focused): Pharmacology, Virology, Bioinformatics, Clinical Trials |
| OpenAlex | Multidisciplinary: Education, Social Sciences, Computer Science, Psychology, Economics |
| PubMed | Biomedical & Life Sciences: Medicine, Genomics, Neuroscience, Public Health |
Prompt Tip:
When querying SciFetch, focus on topics within healthcare, AI, bioinformatics, computer science, or physics, as these are well represented in the integrated repositories.
Niche fields (e.g., Art Theory, Theology) may return sparse or irrelevant results.
To run SciFetch locally, follow these steps:
- Clone this repository:
git clone https://github.com/YOUR_USERNAME/sci-fetch.git
cd sci-fetch- (Optional but recommended) Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install required Python packages:
pip install -r requirements.txt- Install frontend dependencies:
cd frontend
npm install
cd ..- Add your OpenAI key to the
.envfile if needed for local backend execution:
OPENAI_API_KEY=your_openai_api_key_hereNote: SciFetch is publicly accessible through its deployed frontend/backend setup, and it can also be run locally for development.
You can use SciFetch in two ways:
Access the live web application here:
- Enter your research prompt and OpenAI API key.
- The pipeline will fetch and summarize scientific articles and display the generated report preview.
- You can inspect the report preview and download the final PDF artifact.
Launch the agent script and enter your prompt interactively:
python agents/scientific_fetcher.pyYou'll be prompted to enter your research query and API key. A PDF file will be saved locally in the outputs/ folder.
You can expose the agent functionality via a local REST API:
uvicorn app:app --reloadThen access the interactive documentation (API) at:
http://127.0.0.1:8000/docsSend a POST request to the /run endpoint with the following JSON:
{
"prompt": "Applications of self-supervised learning in genomics",
"api_key": "your_openai_api_key_here"
}The server will return:
{
"message": "SciFetch run completed.",
"filename": "applications_of_self_supervised_learning_in_genomics.pdf",
"download_url": "http://127.0.0.1:8000/download/applications_of_self_supervised_learning_in_genomics.pdf",
"output_file": "outputs/applications_of_self_supervised_learning_in_genomics.pdf",
"html_preview": "<!DOCTYPE html>...",
"pdf_warning": null
}The api_key is required in every request and must be a valid OpenAI key.
Although SciFetch is functional and publicly accessible, there are several directions for future enhancement:
- LLM Self-Evaluation: Implement article scoring or ranking based on relevance confidence.
- Advanced PDF Formatting: Enhance visual formatting with typographic refinements, tables, or charts.
- API Usage Monitoring: Track rate limits, quota consumption, and per-tool fallback metrics.
- Multilingual Summarization: Allow output generation in languages other than English.
- Tool Expansion: Add support for new academic APIs (e.g., Semantic Scholar, CORE, IEEE Xplore).
- Offline LLM Compatibility: Explore use of local open-source models (e.g., Mistral, LLaMA) for air-gapped environments.
SciFetch is a small but ambitious project, built to help researchers and engineers accelerate the information gathering process.
It is an evolving tool, open for experimentation, extension, or integration into larger pipelines or interfaces.
If you've found this project useful or inspiring, feel free to build on it, break it, or just drop a star.
- Bugs / feature requests: please open an Issue.
- Direct contact: inigo.rodsan@gmail.com
Developed & maintained by Iñigo Rodriguez.