Ziran Yang

I am a first-year Ph.D. student at Electrical and Computer Engineering Department, Princeton University, advised by Prof. Chi Jin. Previously, I did my undergraduate at Yuanpei College, Peking University. I am interested in the intersection of RL and LLMs, especially on certifiable reasoning.

Email / Google Scholar / Github

Recent Interests

I view exploration as the core challenge in RL, and test-time search being necessary to achieve it for LLM agents. The key technical problem is how search procedures and expert decision-making systems can be internalized as reasoning ability, rather than remaining external scaffolding. I see two tightly coupled aspects: backfilling expert search behavior into the model through learning, and on-the-fly calibration that lets the model assess uncertainty and decide when to search, explore, or trust its own prediction.

Publications

	Goedel-Prover-V2: The Strongest Open-Source Theorem Prover to Date Yong Lin, Shange Tang, Bohan Lyu, Ziran Yang, Jui-Hui Chung, Haoyu Zhao, Lai Jiang, Yihan Geng, Jiawei Ge, Jingruo Sun, Jiayun Wu, Jiri Gesi, David Acuna, Kaiyu Yang, Hongzhou Lin, Yejin Choi, Danqi Chen, Sanjeev Arora, Chi Jin, AI4MATH@ICML 2025 (Oral), ICLR 2026
	ALGOVERI: An Aligned Benchmark for Verified Code Generation on Classical Algorithms Haoyu Zhao, Ziran Yang, Jiawei Li, Deyuan He, Zenan Li, Chi Jin, Venugopal V. Veeravalli, Aarti Gupta, Sanjeev Arora Arxiv*
	Understanding the Sources of Uncertainty for Large Language and Multimodal Models Ziran Yang, Shibo Hao, Hao Sun, Lai Jiang, Qiyue Gao, Yian Ma, Zhiting Hu ICLR 2025 Workshop: Quantify Uncertainty and Hallucination in Foundation Models
	From Uncertainty to Trust: Enhancing Reliability in Vision-Language Models with Uncertainty-Guided Dropout Decoding Yixiong Fang, Ziran Yang, Zhaorun Chen, Zhuokai Zhao, Jiawei Zhou NeurIPS 2025
	Evolving Diverse Red-team Language Models in Multi-round Multi-agent Games Chengdong Ma, Ziran Yang, Hai Ci, Jun Gao, Minquan Gao, Xuehai Pan, Yaodong Yang Arxiv
	Panacea: Pareto Alignment via Preference Adaptation for LLMs Yifan Zhong, Chengdong Ma, Xiaoyuan Zhang, Ziran Yang, Qingfu Zhang, Siyuan Qi, Yaodong Yang, NeurIPS 2024*
	SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset Josef Dai, Tianle Chen, Xuyao Wang, Ziran Yang, Taiye Chen, Jiaming Ji, Yaodong Yang, NeurIPS 2024 (DB Track)
	Offline Reinforcement Learning for LLM Multi-Step Reasoning Huaijie Wang, Shibo Hao, Hanze Dong, Shenao Zhang, Yilin Bao, Ziran Yang, Yi Wu ACL 2025; ICLR 2025 Workshop: Reasoning and Planning for LLMs (Oral)

Experience

	ByteDance Seed 2025.05 - 2025.08 Research Intern Working on RL for Tool-using Agentic LLMs.
	UC San Diego 2024.04 - 2024.11 Research Intern Advisor: Prof. Zhiting Hu
	PAIR Lab: PKU Alignment and Interaction Research Lab 2023.05 - Present Research Intern Advisor: Prof. Yaodong Yang

Services

Reviewer: NeurIPS 2024, ICLR 2025, AISTATS 2025, ICML 2025, NeurIPS 2025, AAAI 2026.

Selected Awards

2024: Peking University Excellent Undergraduate Research Award

2024: SenseTime Scholarship Nomination Award

2024: Fifth Yuanpei Young Scholar Award

2021: Peking University Freshman Scholarship

2019: Ministry of Education Talent Program: annual Outstanding Thesis

This template is a modification to Jon Barron's website.