I am a final-year Ph.D. student (2020-) in Computer Science and Engineering (CSE) at the University of Michigan, Ann Arbor, working with Prof. Justin Johnson and Prof. JJ (Jeong Joon) Park.
I was very fortunate to work with Prof. Andrew Owens.
I work on generative model and 3D vision, especially interested in worling modeling and 3D world creation,including:
(1) 3D/4D recontruction and generation from 2D data: HexPlane; FWD; Lightplane (2) Injecting world knowledge into 3D/4D via generative/foundation models(e.g., VLMs, Image/Video Diffusion Models...) Text2Room; LiftGS
I also enjoy exloring cute ideas in generative models: ViLP
Before that, I was a M.S. Student at UMich ECE (2018-2020) and I did my Bachelor's degree at Wuhan University in China (2014-2018).
We (pre-)train a 3D Visual Language Grounding (3D VLG) model with only 2D supervision, by distilling languages from 2D foundation models with render-supervision.
How could a generator help the agent? We explore the visual language priors in VLMs by constructing novel question-image-answer triplets from image diffusion models. Proposed Image-DPO to encourage the model to use more visual inputs.
We present a neat transformer-based 3D recon and camera pose est. pipeline, which can reconstruct 3D scenes from 1000+ images in one forward pass with ultra-high speed.
We investigate "flashattention" for NeRF: we present Lightplane Splatter and Lightplane Renderer, a pair of extremely memory efficient modules which can lift 2D images to 3D and render from theoretically any 3D hash representation with 4-5 orders of magnitude memory savings.
We show its usage in a set of 3D recon and generation tasks.