Idea2Paper is an end-to-end research agent framework that aims to systematically define and analyze the major stages of the contemporary research process, along with the core challenges inherent to each stage. Rather than treating paper writing as a monolithic generation problem, Idea2Paper explicitly decomposes scientific research into structured phases and identifies critical bottlenecks that hinder the transformation of raw ideas into coherent, submission-ready academic narratives. Through this analysis, Idea2Paper highlights that one of the most fundamental yet underexplored challenges lies in research paradigm generationβthe process of converting an underspecified research idea into a logically consistent, academically grounded research story. Existing systems often struggle to produce stable and reusable research paradigms, especially when reasoning is performed entirely at runtime and under limited contextual grounding.
To address these challenges in a principled and engineering-oriented manner, Idea2Paper adopts a modular system design. Instead of immediately building a fully end-to-end writing system, the project prioritizes the construction of targeted engineering submodules that tackle specific bottlenecks in the research pipeline. As the first and core engineering submodule, Idea2Story is introduced to directly address the problem of research paradigm generation. Idea2Story focuses on transforming underspecified research ideas into complete, coherent, and submission-ready scientific narrative skeletons. By providing a structured research story as an intermediate representation, Idea2Story establishes a stable foundation for downstream stages such as method development, experiment design, and paper writing.
Idea2Paper : https://www.researchgate.net/publication/400280248_Idea2Paper_What_Should_an_End-to-End_Research_Agent_Really_Do
Idea2Story introduces a pre-computationβdriven framework that shifts literature understanding from runtime reasoning to offline knowledge graph construction, enabling more efficient and reliable autonomous scientific discovery.
Idea2Story : https://arxiv.org/abs/2601.20833
- Knowledge-Driven: Uses ICLR data to build a comprehensive knowledge graph.
- Auditable Review: Implements an anchored multi-agent review system for objective feedback.
- Automated Refinement: Includes RAG deduplication and intelligent revision to enhance novelty.
| WeChat Group | Discord Channel |
|---|---|
| https://discord.gg/FfXtbREb |
- πΈοΈ Knowledge Graph: Built from ICLR data with Idea/Pattern/Domain/Paper nodes.
- π£ Advanced Retrieval: Three-path retrieval (Idea/Domain/Paper) with two-stage ranking (Jaccard + Embedding).
- π Idea2Story Generation: From pattern selection to story generation, anchored review, and smart correction.
- π€ Anchored Multi-Agent Review: Uses real review statistics as anchors for relative comparisons, producing deterministic and auditable 1-10 scores.
- π Comprehensive Logging: Per-run structured logs for full reproducibility and auditing.
- π
Paper-KG-Pipeline/output/final_story.json: Final structured Story (title/abstract/problem/method/contribs/experiments). - π
Paper-KG-Pipeline/output/pipeline_result.json: Full pipeline trace (reviews, corrections, audits). - π
log/run_.../: Structured logs for every run.
- Python 3.10+
pip install -r Paper-KG-Pipeline/requirements.txtNote: The embedding model is configurable via
EMBEDDING_MODEL/EMBEDDING_API_URL(env ori2p_config.json). If you switch models, rebuild novelty/recall indexes or use model-specific index directories to avoid mismatch.
Constraint: the embedding dimension must match your index; if you switch models, rebuild indexes or use model-specific index dirs.
Recommended (auto_profile): setI2P_INDEX_DIR_MODE=auto_profileto auto-map each embedding model to its own index dirs:Paper-KG-Pipeline/output/novelty_index__{model}and.../recall_index__{model}.
ExplicitI2P_NOVELTY_INDEX_DIR/I2P_RECALL_INDEX_DIR(env ori2p_config.json) override auto_profile.
Tip (speed/stability): setI2P_ANCHOR_DENSIFY_ENABLE=0to skip Adaptive Densify; otherwise Phase 3 Critic can be much slower and may fail due to strict JSON validation.
Tip (debug): if you repeatedly hit Critic JSON errors, setI2P_CRITIC_STRICT_JSON=0(orcritic.strict_json=false) to disable strict mode and allow fallback.
Tip (LLM temperature): per-stage temperatures are configurable viaI2P_LLM_TEMPERATURE_*orllm.temperature.*; defaults preserve current behavior. Critic is usually low temp for stability, while story generation can be moderate.
Tip (Idea Packaging): optional quality boost via pattern-guided idea packaging + double recall (default off). Enable withI2P_IDEA_PACKAGING_ENABLE=1oridea.packaging_enable=true.
Tip (Subdomain taxonomy): optional quality boost for Path2 to reduce duplicated/long-tail subdomains. When enabled, the pipeline auto-detects and (ifI2P_INDEX_ALLOW_BUILD=1) auto-buildssubdomain_taxonomy.jsonunderrecall_index_dir(recommended: leaveI2P_SUBDOMAIN_TAXONOMY_PATHempty). First build uses batched embeddings; you can also build manually viaPaper-KG-Pipeline/scripts/tools/build_subdomain_taxonomy.py.
Supported (no code changes): OpenAI-compatible Embeddings APIs (/v1/embeddings) that acceptinputas a string or a list.
Not supported yet: DashScope βnativeβ embeddings endpoint (/api/v1/services/embeddings/...) requires an adapter.
π DATA
If you need to use the prebuilt local index, please place the two folders in paper-embedding from Hugging Face into paper-KG-Pipeline/output,
paper-KG-Pipeline/
βββ output/
βββ recall_index__{model}/
βββ novelty_index__{model}/
and make sure the embedding model matches the index you downloaded, otherwise errors may occur.
Migration note (auto_profile naming change): if you previously used provider/urlhash-based dirs, you can either (A) rename the old folders to
recall_index__{model}/novelty_index__{model}, or (B) keep old folder names and setI2P_RECALL_INDEX_DIR/I2P_NOVELTY_INDEX_DIRexplicitly to those paths.
- Copy
.env.exampleto.envand fill inLLM_API_KEY(and optionallyLLM_PROVIDER,LLM_BASE_URL). - (Optional) Copy
i2p_config.example.jsontoi2p_config.jsonto tweak settings.
python Paper-KG-Pipeline/scripts/idea2story_pipeline.py "your research idea"Status: The frontend is currently unstable. We recommend running the pipeline from the terminal for now. We will improve the frontend in future updates.
Run a minimal local UI to launch the pipeline and view only high-level stage + final results (no raw logs on screen).
python frontend/server/app.py --host 127.0.0.1 --port 8080Open in your browser:
http://127.0.0.1:8080/
- Run the same pipeline entrypoint (
idea2story_pipeline.py) from a web page. - Configure
LLM_API_KEY,LLM_PROVIDER,LLM_BASE_URL/LLM_API_URL,LLM_MODELfor the current run (not persisted by the server). - Toggle Novelty / Verification.
- Download the current run logs as a zip.
For more details, see frontend/README.md.
output/
βββ final_story.json # Final generated paper story
βββ pipeline_result.json # Full pipeline results
βββ log.json # Detailed logs
Check final_story.json for the result and pipeline_result.json for the full process.
Instead of arbitrary scores, this project uses anchored comparisons. We select anchor papers with known scores, ask LLMs to compare your target against these anchors (better/tie/worse), and then deterministically fit a final numeric score. This ensures the review process is auditable and grounded in real-world data.
- Core Code:
Paper-KG-Pipeline/src/idea2paper/ - Documentation:
| No. | Document | Content | Target Audience |
|---|---|---|---|
| 0 | Project Overview | Overall architecture, core modules, parameter configuration, execution workflow | Everyone |
| 1 | Knowledge Graph Construction | Data sources, node/edge definitions, LLM enhancement, how to run | Developers |
| 2 | Retrieval System | Three-way retrieval strategies, similarity computation, performance optimization | Developers |
| 3 | Idea2Story Pipeline | Pattern selection, Idea fusion, story reflection, critic review | Developers |
- Review Details: MULTIAGENT_REVIEW.md
We welcome PRs and Issues! Please follow the contribution guidelines. Licensed under the MIT License.
- Data Source: ICLR (see KG construction docs)
- Inspiration: Auditable, anchor-centered review processes.
- Community Support: agentAlpha Community
If you find Idea2Story useful, please cite:
@misc{xu2026idea2storyautomatedpipelinetransforming,
title={Idea2Story: An Automated Pipeline for Transforming Research Concepts into Complete Scientific Narratives},
author={Tengyue Xu and Zhuoyang Qian and Gaoge Liu and Li Ling and Zhentao Zhang and Biao Wu and Shuo Zhang and Ke Lu and Wei Shi and Ziqi Wang and Zheng Feng and Yan Luo and Shu Xu and Yongjin Chen and Zhibo Feng and Zhuo Chen and Bruce Yuan and Harry Wang and Kris Chen},
year={2026},
eprint={2601.20833},
archivePrefix={arXiv},
primaryClass={cs.CE},
url={https://arxiv.org/abs/2601.20833}
}@article{xu2026idea2paper,
title={Idea2Paper: What Should an End-to-End Research Agent Really Do?},
author={Xu, Tengyue and Qian, Zhuoyang and Liu, Gaoge and Zhang, Zhentao and Ling, Li and Wu, Biao and Zhang, Shuo and Lu, Ke and Shi, Wei and Wang, Ziqi and others},
year={2026}
}


