SciFetch: Autonomous Agent for Scientific Literature Retrieval

SciFetch is an autonomous AI agent designed to search, synthesize, and generate scientific literature reports based on natural language prompts.

It leverages modern AI and web technologies, LangChain for orchestration, OpenAI for summarization, and academic APIs for up-to-date content retrieval. The final output is delivered as a styled, downloadable PDF report, accessible via a clean web interface.

Try it live at: https://scifetch.vercel.app

Introduction

SciFetch is a full-stack autonomous system designed to assist researchers and professionals in exploring scientific literature efficiently through AI-powered summarization and presentation.

Born from the need to automate scientific information retrieval, SciFetch queries multiple trusted academic APIs, extracts relevant publications, and synthesizes a human-readable report in PDF format.

It combines the orchestration capabilities of LangChain, the language generation power of OpenAI models, and a web-friendly interface built with Next.js to offer:

Fast and structured access to scientific knowledge.
Reliable summarization of complex topics from multiple sources.
Ready-to-use, visually styled PDF reports.
Deployment flexibility, with both API and browser access.

The platform serves both as a research assistant and as a proof of concept for combining autonomous retrieval, modern web development, and scientific APIs into an end-to-end application.

Documentation

Additional technical documentation is available in the /docs directory.

ARCHITECTURE.md
Provides a detailed overview of the repository structure, backend and frontend components, API integrations, and the main technical building blocks of SciFetch.

This document is intended for readers who want a deeper technical view of how the system is organized beyond the high-level project overview in this README.

Key Features

Autonomous Literature Agent: Uses a retrieval pipeline with source-specific tools to query the most relevant academic APIs for each request.
Multi-Source Scientific Retrieval: Aggregates results from PubMed, arXiv, OpenAlex, Europe PMC, and CrossRef to ensure coverage and diversity.
LLM-Powered Summarization: Synthesizes complex, multi-source information into a cohesive and accessible summary using OpenAI models.
Styled PDF Report Generation: Outputs are delivered as downloadable, professionally formatted PDF documents using a shared HTML report template rendered through a high-fidelity PDF workflow.
Modern Web Interface: Built with Next.js, the frontend allows users to submit research prompts, inspect the generated report preview, and download the final PDF directly from the browser.
Full Public Deployment: The backend is deployed on Render and the frontend on Vercel, providing instant access at https://scifetch.vercel.app.
Graceful Failure Handling: If one API fails or returns incomplete data, the pipeline continues processing with the remaining sources.
Secure API Usage: Requires an OpenAI API key, sent per request and handled server-side during execution.
Tested for Robustness: Includes a high-coverage test suite using Pytest (89%) to ensure system reliability and future extensibility.

Scientific Domain Coverage

SciFetch integrates multiple academic APIs, each specializing in different scientific domains. Understanding the scope of each can help formulate more effective prompts.

API	Covered Domains (Examples)
arXiv	Hard Sciences & CS: Artificial Intelligence, Physics, Mathematics, Computer Vision, Quantitative Finance
CrossRef	General Metadata: Scientific articles from all disciplines including Arts, Law, and Engineering
EuropePMC	Biomedical (Europe-focused): Pharmacology, Virology, Bioinformatics, Clinical Trials
OpenAlex	Multidisciplinary: Education, Social Sciences, Computer Science, Psychology, Economics
PubMed	Biomedical & Life Sciences: Medicine, Genomics, Neuroscience, Public Health

Prompt Tip:
When querying SciFetch, focus on topics within healthcare, AI, bioinformatics, computer science, or physics, as these are well represented in the integrated repositories.
Niche fields (e.g., Art Theory, Theology) may return sparse or irrelevant results.

Installation

To run SciFetch locally, follow these steps:

Clone this repository:

git clone https://github.com/YOUR_USERNAME/sci-fetch.git
cd sci-fetch

(Optional but recommended) Create and activate a virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install required Python packages:

pip install -r requirements.txt

Install frontend dependencies:

cd frontend
npm install
cd ..

Add your OpenAI key to the .env file if needed for local backend execution:

OPENAI_API_KEY=your_openai_api_key_here

Note: SciFetch is publicly accessible through its deployed frontend/backend setup, and it can also be run locally for development.

Usage

You can use SciFetch in two ways:

1. Use the Web App (recommended)

Access the live web application here:

https://scifetch.vercel.app

Enter your research prompt and OpenAI API key.
The pipeline will fetch and summarize scientific articles and display the generated report preview.
You can inspect the report preview and download the final PDF artifact.

2. Run Locally via CLI or API (advanced)

Option A: Run the agent via CLI

Launch the agent script and enter your prompt interactively:

python agents/scientific_fetcher.py

You'll be prompted to enter your research query and API key. A PDF file will be saved locally in the outputs/ folder.

Option B: Run the FastAPI server

You can expose the agent functionality via a local REST API:

uvicorn app:app --reload

Then access the interactive documentation (API) at:

http://127.0.0.1:8000/docs

Send a POST request to the /run endpoint with the following JSON:

{
  "prompt": "Applications of self-supervised learning in genomics",
  "api_key": "your_openai_api_key_here"
}

The server will return:

{
  "message": "SciFetch run completed.",
  "filename": "applications_of_self_supervised_learning_in_genomics.pdf",
  "download_url": "http://127.0.0.1:8000/download/applications_of_self_supervised_learning_in_genomics.pdf",
  "output_file": "outputs/applications_of_self_supervised_learning_in_genomics.pdf",
  "html_preview": "<!DOCTYPE html>...",
  "pdf_warning": null
}

The api_key is required in every request and must be a valid OpenAI key.

Future Work

Although SciFetch is functional and publicly accessible, there are several directions for future enhancement:

LLM Self-Evaluation: Implement article scoring or ranking based on relevance confidence.
Advanced PDF Formatting: Enhance visual formatting with typographic refinements, tables, or charts.
API Usage Monitoring: Track rate limits, quota consumption, and per-tool fallback metrics.
Multilingual Summarization: Allow output generation in languages other than English.
Tool Expansion: Add support for new academic APIs (e.g., Semantic Scholar, CORE, IEEE Xplore).
Offline LLM Compatibility: Explore use of local open-source models (e.g., Mistral, LLaMA) for air-gapped environments.

Contributing & Contact

SciFetch is a small but ambitious project, built to help researchers and engineers accelerate the information gathering process.
It is an evolving tool, open for experimentation, extension, or integration into larger pipelines or interfaces.

If you've found this project useful or inspiring, feel free to build on it, break it, or just drop a star.

Bugs / feature requests: please open an Issue.
Direct contact: inigo.rodsan@gmail.com

Developed & maintained by Iñigo Rodriguez.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
agents		agents
apis		apis
docs		docs
frontend		frontend
inputs		inputs
outputs		outputs
prompt-engineering		prompt-engineering
templates		templates
tests		tests
utils		utils
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
Procfile		Procfile
README.md		README.md
app.py		app.py
pytest.ini		pytest.ini
render-build.sh		render-build.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SciFetch: Autonomous Agent for Scientific Literature Retrieval

Table of Contents

Introduction

Documentation

Key Features

Scientific Domain Coverage

Installation

Usage

1. Use the Web App (recommended)

2. Run Locally via CLI or API (advanced)

Option A: Run the agent via CLI

Option B: Run the FastAPI server

Future Work

Contributing & Contact

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SciFetch: Autonomous Agent for Scientific Literature Retrieval

Table of Contents

Introduction

Documentation

Key Features

Scientific Domain Coverage

Installation

Usage

1. Use the Web App (recommended)

2. Run Locally via CLI or API (advanced)

Option A: Run the agent via CLI

Option B: Run the FastAPI server

Future Work

Contributing & Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages