I am a research fellow at the Department of Mathematics, National University of Singapore. Previously, I obtained my PhD at NUS and I was previleged to be supervised by Prof. Vincent Y. F. Tan. Before that, I received my BS in Mathematics from Beijing Normal University, advised by Prof. Huajie Chen and Prof. Shihua Zhang.

Research

I am broadly interersted in both the theoretical limits and emprical applications of reinforcement learning and online learning (e.g., multi-armed bandits), with current emphasis on theoretically-guaranteed algorithmic design in LLMs. Feel free to reach out if you share similar interests!

I am currently seeking postdoctoral positions. Please feel free to reach out!

Reinforcement Learning and Bandit Algorithm

Characterization of fundamental limits of decision-making under practical modeling assumptions (e.g., stationary v.s. non-stationary, single- v.s. multi-agent), possibly subject to realistic constraints (e.g., efficiency or risk requirements).
Design of provable algorithms that approach the fundamental performance limits.

Theoretically-grounded Applications of Reinforcement Learning

Development of practical and provably sound reinforcement learning methods for decision-making during model training and deployment.

News

[2026-06] Posted “On the Benefits of Free Exploration for Regret Minimization in Multi-Armed Bandits”. We study a variant of the regret minimization problem in stochastic MAB, where the learner is granted a free exploration time period before regret starts to accumulate. Joint work with Zixin and Vincent.
[2026-06] “Rethinking “RL Generalizes, SFT Memorizes”: The Role of SFT Data” accepted to the first Foundations of Deep Generative Models (FoGen) workshop, ICML 2026, and 2026 INFORMS Annual Meeting. We revisit the generalization ability of SFT and RL training from the data perspective. Great thanks to the wonderful collaborators!
[2026-06] “Demystifying the Slash Pattern in Attention: The Role of RoPE” accepted to the first Foundations of Deep Generative Models (FoGen) workshop, ICML 2026.
[2026-01] I PhinisheD my PhD study! Great thanks to the support from everyone, especially my supervisor Prof. Vincent Tan!
[2026-01] Posted “Demystifying the Slash Pattern in Attention: The Role of RoPE”. We found RoPE is related to the emergence of slash patterns in attention matrices. Huge thanks to the amazing collaborators!
[2025-12-22] Yunlong’s homepage was finally online! 🎉 Hello, world!

Yunlong Hou

Research

Reinforcement Learning and Bandit Algorithm

Theoretically-grounded Applications of Reinforcement Learning

News