2026.01.19🌟 We’ve released ICLR 2026 predictions powered by the PaperDecision framework—check them out now!
Academic peer review is central to research publishing, yet it remains difficult to model due to its subjectivity, dynamics, and multi-stage decision process. We introduce PaperDecision, an end-to-end framework for modeling and evaluating the peer review process with large language models (LLMs).
Our work distinguishes itself through the following key contributions:
-
An agent-based review system. We develop PaperDecision-Agent, a multi-agent framework that simulates authors, reviewers, and area chairs, enabling holistic and interpretable modeling of review dynamics.
-
A dynamic benchmark. We construct PaperDecision-Bench, a large-scale multimodal benchmark that links papers, reviews, rebuttals, and final decisions, and is continuously updated with newly released conference rounds to support forward-looking evaluation and avoid data leakage.
-
Empirical insights. We achieve up to ~82% accuracy in accept–reject prediction with frontier multimodal LLMs and identify key factors influencing acceptance outcomes, such as reviewer expertise and score changes.
PaperDecision-Agent models key roles in the real-world peer review process through specialized agents. It simulates structured interactions among authors, reviewers, and area chairs to capture the full decision-making workflow.
PaperDecision-Bench is a dynamic and continually extending evaluation framework explicitly aligned with the evolving ICLR peer review process, rather than a static dataset. By grounding evaluation in future decision prediction and cross-year extension, the benchmark is inherently resistant to benchmark-specific overfitting and better reflects real-world conference usage scenarios.
To balance accessibility and realism, PaperDecision-Bench adopts a three-tier evaluation design:
-
B1: Future Prediction. Targets ICLR 2026 decision prediction, where models observe papers and reviews while final outcomes remain hidden, serving as a gold-standard test of cross-temporal generalization.
-
B2: Retrospective. Covers complete ICLR 2023–2025 data for robust retrospective evaluation, enabling reliable model comparison and systematic error analysis.
-
B3: MiniSet-1K. Provides a cost-efficient benchmark focusing on MLLM, 3D, and RL papers with ambiguous decision boundaries, supporting rapid iteration and analysis.
All data in PaperDecision-Bench are sourced from OpenReview, and the benchmark will be continuously updated as new conference rounds are released.
python multi_agent.py
python evaluation_metric.py
python analysis.py
If you find our work helpful for your research, please consider citing our work.
@misc{PaperDecision2026,
author = {Zhang, Yi-Fan and Dong, Yuhao and Zhang, Saining and Wu, Kai and Wang, Liang and Shan, Caifeng and Liu, Ziwei and He, Ran and Zhao, Hao and Fu, Chaoyou},
title = {ICLR 2026 Acceptance Prediction: Benchmarking Decision Process with a Multi-Agent System},
howpublished = {\url{https://github.com/PaperDecision/PaperDecision}},
year = {2026},
note = {Accessed: 2026-01-18}
}




