I am a researcher working on cutting-edge research projects at the intersection of
Computer Vision, Computational Photography, and Machine Learning at the Computational Photography team of Meta Reality Labs.
My research interests cover 3D vision, neural rendering, low-level vision, and visual-linguistic understanding.
We present a simple but effective technique to boost the rendering quality, which can be easily integrated with most volumetric view synthesis methods.
The core idea is to transfer color residuals (the difference between the input images and their reconstruction) from training views to novel views.
Proposed an end-to-end trainable burst denoising pipeline which jointly captures high-resolution and high-frequency deep features derived from wavelet transforms.
To incrementally generates complete and consistent 2D or 3D scenes with learned scene priors, while real
observations of an actual scene can be incorporated, and unobserved parts of the scene can be
hallucinated.
Applications include autonomous agent exploration and few-shot learning.
Presents a method of 3D point cloud segmentation using 2D supervision. A graph-based pyramid feature
network is proposed to capture global and local feature of points. A perspective rendering and semantic
fusion module is also introduced to offer refined 2D supervision.
Presents an effective end-to-end framework for detecting multi-lingual scene texts in arbitrary
orientations by integrating text attention model and global enhancement block with the pixel-link method
without adopting pretrained weights or extra synthetic datasets.
To utilize text instances for understanding natural scenes,
we have proposed a framework that combines image-based
text localization with language-based context description
for text instances.
Specifically, we explore the task of unambiguous text localization and retrieval, to accurately localize
a specific targeted text instance in a cluttered image given a natural language description that refers
to it.
We introduce an algorithm based on region trajectories to establish the connections between object
localization in individual frames and video sequences.