Kaiqu Liang

I am a 4th year PhD student in Computer Science at Princeton University, advised by Jaime Fernández Fisac and Tom Griffiths. My AI alignment work has been featured in outlets including IEEE Spectrum and New Scientist.

Previously, I completed my MPhil in Machine Learning at the University of Cambridge, advised by Samuel Albanie and Bill Byrne. I did my undergraduate studies at the University of Toronto, advised by Roger Grosse and Sven Dickinson, and was a researcher at Vector Institute.

Email / Google Scholar / Twitter / Github / LinkedIn

News

02/2026: Released my work at Meta: Learning Personalized Agents from Human Feedback (Website, Tweet).

07/2025: Released Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models (Website, Tweet). Looking forward to future work on understanding and mitigating Machine Bullshit!

06/2025: Research Intern at Meta Superintelligence Labs.

01/2025: Released RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation (Website, Tweet).

12/2024: Our workshop proposal - Public Trust in Autonomous Systems - is accepted by ICRA 2025! We look forward to hosting it in May 2025.

09/2024: Our paper Introspective Planning is accepted to NeurIPS 2024.

Research

I'm working on human-AI alignment & safety. My long-term goal is to build trustworthy AI systems that effectively assist humans. Previously, I also did research on video retrieval and out-of-distribution generalization. I’m always open to collaboration. If you find our interests align, please feel free to drop me an email.

- I've been considering these questions on human-AI alignment & safety: Are we underestimating the ways LLMs might subtly influence daily human decisions? Do we recognize that certain harms might surface long after the interaction? How should our alignment algorithms account for these risks?

- I've been considering these questions on AI agents: Are LLMs capable enough for reliable planning? Do their actions truly align with human intentions? How can we ensure agents act safely, beneficially, and in alignment with human needs?

	Learning Personalized Agents from Human Feedback Kaiqu Liang, Julia Kruk, Shengyi Qian, Xianjun Yang, Shengjie Bi, Yuanshun Yao, Shaoliang Nie, Mingyang Zhang, Lijuan Liu, Jaime Fernández Fisac, Shuyan Zhou, Saghar Hosseini Paper \| Website \| Code \| Tweet
	Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models Kaiqu Liang, Haimin Hu, Xuandong Zhao, Dawn Song, Thomas L. Griffiths, Jaime Fernández Fisac Paper \| Website \| Code \| Data \| Tweet Media coverage: IEEE Spectrum \| New Scientist \| CNET \| Psychology Today \| MediaPost \| Unite.AI \| The Economic Times \| MIT Technology Review (China)
	RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation Kaiqu Liang, Haimin Hu, Ryan Liu, Thomas L. Griffiths, Jaime Fernández Fisac Preprint & NeurIPS Safe Generative AI Workshop Paper \| Website \| Code \| Poster \| Tweet We found that RLHF can induce significant misalignment when humans provide feedback while implicitly predicting future outcomes, creating incentives for LLM deception. To address this, we propose RLHS (Hindsight Simulation): By simulating future outcomes of the interaction before providing feedback, we drastically reduce misalignment.
	Introspective Planning: Aligning Robots' Uncertainty with Inherent Task Ambiguity Kaiqu Liang, Zixu Zhang, Jaime Fernández Fisac Neural Information Processing Systems (NeurIPS), 2024 Paper \| Website \| Code \| Poster \| Tweet We proposed introspective planning as a systematic approach that utilizes reasoning and memory to refine the uncertainty of language agents.
	Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots Haimin Hu, Gabriele Dragotto, Zixu Zhang, Kaiqu Liang, Bartolomeo Stellato, Jaime Fernández Fisac Robotics: Science and Systems (RSS), 2024 We introduced Branch and Play (B&P), an algorithm that effectively resolves multi-agent spatial navigation problems by determining the optimal order of play.
	Simple Baselines for Interactive Video Retrieval with Questions and Answers Kaiqu Liang, Samuel Albanie International Conference on Computer Vision (ICCV), 2023 Paper \| Code We proposed several simple yet effective baselines for interactive video retrieval via question-answering.
	Path Independent Equilibrium Models Can Better Exploit Test-Time Computation Cem Anil, Ashwini Pokle, Kaiqu Liang, Johannes Treutlein, Yuhuai Wu, Shaojie Bai, Zico Kolter, Roger Grosse Neural Information Processing Systems (NeurIPS), 2022 We demonstrated that equilibrium model improves generalization in harder instances due to their path independence*, highlighting its importance for model performance and scalability.
	Out-of-Distribution Generalization with Deep Equilibrium Models Kaiqu Liang, Cem Anil, Yuhuai Wu, Roger Grosse ICML Workshop on Uncertainty and Robustness in Deep Learning , 2021 We demonstrated and discussed why Deep Equilibrium (DEQ) Models outperform fixed-depth counterparts in generalizing under distribution shifts.

Education

Princeton University, USA Ph.D. in Computer Science • Aug. 2022 to Now
Cambridge University, UK MPhil in Machine Learning and Machine Intelligence • Oct. 2021 to Aug. 2022
University of Toronto, Canada Honours Bachelor of Science • Sep. 2017 to May 2021 Computer Science Specialist & Statistics Major & Mathematics Minor

Teaching

Teaching Assistant • ECE346/COS348/MAE346: Intelligent Robotic Systems • Princeton University

Teaching Assistant • COS 350: Ethics of computing • Princeton University

Teaching Assistant • CSC165: Mathematical Expression and Reasoning for Computer Science • University of Toronto

Service

Reviewer

International Conference on Learning Representations (ICLR)

Neural Information Processing Systems (NeurIPS)

International Conference on Machine Learning (ICML)

European Conference on Computer Vision (ECCV)

Computer Vision and Pattern Recognition (CVPR)

IEEE Robotics and Automation Letters (RA-L)

Organizer

Co-organizer, ICRA workshop on Public Trust in Autonomous Systems (2025)

Website source from Jon Barron