I am a PhD(/MS int.) student at KAIST, advised by Kimin Lee.
I received a B.S. degree with a double major in both mathematics and computer science/engineering at
POSTECH.
I have an experience as an exchange student at Stanford.
Recently, I am working as a research engineer (contractor via YunoJuno) at Google DeepMind.
My main research interest is to build capable and reliable AI agents,
currently focusing on digital tasks (e.g., web tasks).
We propose a new benchmark for evaluating the safety and helpfulness of agents,
with extensive analysis of the shortcomings of frontier LLM agents in mobile device control.
A novel benchmark that can serve as a unified testbed for mobile device control agents
on performing practical daily tasks across diverse device configurations.
A novel framework of training a contextualization module to help the decision-making of LLM agents
achieves the super-human performance in the WebShop benchmark.
Reinforcement learning agents become robust to the changes in the style of the image (e.g.,
background color)
by adapting to adversarially generated styles.