My research interest spans a wide range of deep generative models (AR, flow, GAN, diffusion,
etc.) applied to sequential data. Specifically, I am working on building multi-modal large language models
with a focus on audio.
During my Ph.D., I focused on time-domain waveform data (speech and audio) to advance generative modeling for audio.
I am also broadly interested in speech and audio applications, including text-to-speech, voice conversion, music generation, neural audio codecs, and audio language models.
Representative papers are highlighted.
Experience
Research Scientist @ NVIDIA
Jan 2024 - Current
In the Applied Deep Learning Research team, I am working on building multi-modal large language models with a focus on audio.
Sep 2021 - Jan 2022
As a research intern, I worked on improving neural vocoders for high quality speech and audio synthesis, advised by
Wei Ping and
Boris Ginsburg.
Senior Research Engineer @ Qualcomm AI Research
Feb 2023 - Jan 2024
I developed a framework for Text-to-Speech (TTS) research and development, optimized for deployment on edge devices.
Research Intern @ Microsoft Research Asia
Dec 2020 - May 2021
I worked on diffusion-based generative models for speech synthesis, advised by
Xu Tan,
Chang Liu,
Qi Meng, and
Tao Qin.
Dec 2018 - Feb 2019
I worked on the Antigen Map
Project,
where I applied sequence models to predict antigens from genetic sequences, advised by
Bin Shao.
Research Intern @ Kakao Corporation
Jul 2019 - Sep 2019
I worked on improving speech synthesis and voice conversion models, advised by
Jaehyeon Kim and Jaekyong Bae.
Dual B.S. in Seoul National University
Electrical and Computer Engineering / Applied Biology and Chemistry
Mar 2010 - Aug 2016
Cum Laude
Projects
During my time at DSAIL, I collaborated with Seoul
National University Hospital on a computer-aided diagnosis project for liver cancer.
The project yielded a high-performance medical object detection model to help reduce human errors from radiologists for the early detection of liver disease.
Invited Talks, Honors, and Awards
Invited Talk "Deep Generative Model for Speech and Audio", Soongsil
University, 2023