About meI am a first-year Ph.D. student in Paul G. Allen School of Computer Science & Engineering at the University of Washington. I'm fortunate to be advised by Simon Du and Banghua Zhu. Previously, I received my bachelor's degree in Computer Science (Yao class), with a minor in Literature, from Tsinghua University.
Contact
Selected Papers Understanding the Performance Gap in Preference Learning: A Dichotomy of RLHF and DPO.
Decoding-Time Language Model Alignment with Multiple Objectives.
Rethinking Transformers in Solving POMDPs.
Notes and SlidesUnderstanding the gaps between two-stage and direct preference-based policy learning. [slide] The crucial role of samplers in online direct preference optimization. [slide][recording] Logit mixing and RLHF paper reading. [slide] Decoding-time language model alignment with multiple objectives. [slide][recording] |