Previously, I completed my MPhil in Machine Learning at the University of Cambridge, advised by Samuel Albanie and Bill Byrne.
I did my undergraduate studies at the University of Toronto, advised by Roger Grosse and Sven Dickinson,
and was a researcher at Vector Institute.
I'm working on human-AI alignment & safety. My long-term goal is to build trustworthy AI systems that effectively assist humans. Previously, I also did research on video retrieval and out-of-distribution generalization. I’m always open to collaboration. If you find our interests align, please feel free to drop me an email.
- I've been considering these questions on human-AI alignment & safety:
Are we underestimating the ways LLMs might subtly influence daily human decisions?
Do we recognize that certain harms might surface long after the interaction?
How should our alignment algorithms account for these risks?
- I've been considering these questions on AI agents: Are LLMs capable enough for reliable planning?
Do their actions truly align with human intentions? How can we ensure agents act safely, beneficially, and in alignment with human needs?
We found that RLHF can induce significant misalignment when humans provide feedback while implicitly predicting future outcomes, creating incentives for LLM deception. To address this, we propose RLHS (Hindsight Simulation): By simulating future outcomes of the interaction before providing feedback, we drastically reduce misalignment.
We introduced Branch and Play (B&P), an algorithm that effectively resolves multi-agent spatial navigation problems by determining the optimal order of play.
We demonstrated that equilibrium model improves generalization in harder instances due to their path independence, highlighting its importance for model performance and scalability.