Guangyao Dou

Hi, my name is Guangyao Dou (窦光耀). I'm a first-year PhD student in Computer Science at the Center for Language and Speech Processing at Johns Hopkins University, advised by Prof. Benjamin Van Durme.

Previously, I completed my master's degree at the University of Pennsylvania, where I worked with Prof. Chris Callison-Burch and Prof. Eric Wong. Before that, I earned a B.S. in Computer Science from Brandeis University, graduating with honors as a member of Phi Beta Kappa (top 10%).

I study how to improve the computational reasoning capabilities of large language models through post-training and reinforcement learning techniques. Previously, I have also worked on the safety and trustworthiness of LLMs.

Email / Master's Thesis / Google Scholar / X (Twitter) / Github / LinkedIn

🔥What's New

2026/06/07 Our new survey paper about Inference-Time Control for Trustworthy Large Language Models is now available!
2025/08/25 Happily started my PhD at Johns Hopkins University! 🎉
2025/06/02 Starting a role at Amazon as an Applied Scientist Intern this summer! 🚀
2025/05/15 MANU has been accepted to ACL 2025 Main. Congrats to all collaborators! 🎉
2025/01/22 Two papers have been accepted to NAACL! Congrats to all collaborators! 🎉 Links: MLLMU-Bench, SSU.
2024/12/13 Glad to receive the Best Master's Thesis Award at Penn! 🏆
2024/08/02 Glad to receive the GAPSA Professional Student Individual Grant from Penn! 🎓🏅
2024/07/30 Our new survey paper about Generative AI Machine Unlearning is now available on arxiv!
2024/05/15 One paper has been accepted to ACL 2024👏: SKU. See you in Thailand!
2024/01/23 Our paper ConMU has been accepted to WWW conference 2024!

Selected Publications (* indicates equal contribution) [Google Scholar]

2026

DAR: Deontic Reasoning with Agentic Harnesses
Guangyao Dou, William Jurayj, Nils Holzenberger, Benjamin Van Durme
arXiv preprint, 2026. [paper]

We introduce Deontic Agentic Reasoning (DAR), an agentic setup in which models query long, cross-referenced statutes on demand rather than reading them in full. Evaluated under multiple harnesses on hard subsets of DeonticBench, agentic harnesses push the frontier on deontic reasoning, but gains are uneven — weaker models often degrade on numerical tasks while consuming far more tokens.

DeonticBench: A Benchmark for Reasoning over Rules
Guangyao Dou, Luis Brena, Akhil Deo, William Jurayj, Jingyu Zhang, Nils Holzenberger, Benjamin Van Durme
arXiv preprint, 2026. [paper]

We introduce DeonticBench, a benchmark of 6,232 tasks for evaluating rule-based reasoning in language models across domains including US federal taxes, airline baggage policies, US immigration administration, and US state housing law. The benchmark supports both language-based and computational reasoning approaches. Systems can optionally make use of provided symbolic translations. Results show current models struggle, achieving only 44.4% on numeric subsets and 46.6% on housing cases, and that fine-tuning and reinforcement learning approaches remain unreliable for solving complex rule-reasoning problems.

2025

	Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models Zheyuan Liu, Guangyao Dou, Xiangchi Yuan,Chunhui Zhang, Zhaoxuan Tan, Meng Jiang Proceedings of ACL 2025 (Main). We propose Modality Aware Neuron Unlearning (MANU), a novel unlearning framework for MLLMs designed to selectively clip neurons based on their relative importance to the targeted forget data, curated for different modalities. Specifically, MANU consists of two stages: important neuron selection and selective pruning. The first stage identifies and collects the most influential neurons across modalities relative to the targeted forget knowledge, while the second stage is dedicated to pruning those selected neurons.
	Avoiding Copyright Infringement via Large Language Model Unlearning Guangyao Dou, Zheyuan Liu, Qing Lyu, Kaize Ding, Eric Wong Proceedings of NAACL 2025 (Findings). We propose Stable Sequential Unlearning (SSU), a novel framework designed to unlearn copyrighted content from LLMs over multiple time steps. Our approach works by identifying and removing specific weight updates in the model's parameters that correspond to copyrighted content. We improve unlearning efficacy by introducing random labeling loss and ensuring the model retains its general-purpose knowledge by adjusting targeted parameters.
	Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench Zheyuan Liu, Guangyao Dou, Mengzhao Jia, Zhaoxuan Tan, Qingkai Zeng, Yongle Yuan, Meng Jiang Proceedings of NAACL 2025 (Main). We introduce Multimodal Large Language Model Unlearning Benchmark (MLLMU-Bench), a novel benchmark aimed at advancing the understanding of multimodal machine unlearning. MLLMU-Bench consists of 500 fictitious profiles and 153 profiles for public celebrities, each profile feature over 14 customized question-answer pairs, evaluated from both multimodal (image+text) and unimodal (text) perspectives.

2024

	Towards Safer Large Language Models through Machine Unlearning Zheyuan Liu, Guangyao Dou, Zhaoxuan Tan, Yijun Tian, Meng Jiang Proceedings of ACL (Findings), 2024. We introduce Selective Knowledge negation Unlearning (SKU), a novel unlearning framework for LLMs, designed to eliminate harmful knowledge while preserving utility on normal prompts.
	Breaking the Trilemma of Privacy, Utility, Efficiency via Controllable Machine Unlearning Zheyuan Liu, Guangyao Dou, Yijun Tian, Chunhui Zhang, Eli Chien, Ziwei Zhu Proceedings of The Web Conference (WWW), 2024. We present Controllable Machine Unlearning (ConMU), a novel framework designed to facilitate the calibration of MU.

Industrial Experience

	Amazon Web Service Santa Clara, CA, USA 2025.06 - 2025.08 Applied Scientist Intern Manager: Mukul Prasad Mentor: Vidyashankar Sivakumar
	Amazon Payment Service Seattle, WA, USA 2021.05 - 2021.08 Software Development Engineer Intern

Education

	Johns Hopkins University Baltimore, MD, USA 2025.08 - Present Ph.D. in Computer Science Advisor: Prof. Benjamin Van Durme
	University of Pennsylvania Philadelphia, PA, USA 2023.08 - present MSE in Data Science GPA: 4.00 / 4.00
	Brandeis University Waltham, MA, USA 2019.08 - 2023.05 B.S. in Computer Science GPA: 3.98 / 4.00

Teaching

Teaching Assistant, CIS 5190: Machine Learning, University of Pennsylvania (Fall 2024, Spring 2025)
Teaching Assistant, Data Structures and the Fundamentals of Computing, Brandeis University (Fall 2021)

Academic Service

Conference Reviewer: NeurIPS, EMNLP, NAACL, ACL
Journal Reviewer: IEEE Transactions on Information Forensics and Security, npj Digital Medicine (Nature Portfolio)

Miscellaneous

I've always been surrounded by wonderful friends, collaborators, and advisors, and I try to maintain an optimistic outlook. If you're having a tough time and would like someone to talk to, feel free to reach out!
I like basketball, lifting, and making new friends.