Skip to content
View jzh15's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report jzh15

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
jzh15/README.md

🌟 Jian Zhang | 张舰

Typing SVG

Homepage Google Scholar CV Email


Jian Zhang

🎓 Graduate Student
Xiamen University

🔬 Research Intern
Baidu Inc.

🚀 Research Vision

My long-term vision follows a progressive pathway: first achieving 3D-consistent content generation, then developing comprehensive 3D understanding, and ultimately enabling intelligent embodied agents that can navigate and interact within these 3D environments.

🎯 Current Focus Areas

  • 🎬 3D-Consistent Content Generation
  • 🔬 3D Spatial Understanding
  • 🤖 3D Embodied Agents
  • 🎮 Virtual Worlds & Metaverse Applications

🎓 Education

  • Graduate Student | Xiamen University (Sept 2023 - Present)
  • B.S. Artificial Intelligence | Nanchang University (Sept 2019 - June 2023)

💼 Experience

  • Research Intern | Baidu Inc. (Aug 2025 - Present) - Video Generation Research
  • Research Assistant | Texas A&M University (May 2025 - Aug 2025) - 3D Vision & Embodied Intelligence
  • Research Assistant | VITA Group, University of Texas at Austin (Jan 2024 - May 2025) - 3D Spatial Reconstruction & Understanding

📚 Featured Publications

🔥 Recent Highlights

🌟 VLM-3R: Vision-Language Models Augmented with 3D Reconstruction

ArXiv 2025 | Jian Zhang*, Zhiwen Fan*, et al.

Unified VLM framework incorporating 3D Reconstructive instruction tuning, processing monocular video to derive implicit 3D tokens for spatial assistance and embodied reasoning.

Paper Code Project Demo


🌍 DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Preprint | Kairun Wen*, Yuzhi Huang*, ..., Jian Zhang, et al.

Large-scale dataset with 100K+ videos, 800K+ masks, and 10M+ frames for understanding dynamic physical worlds with evolving 3D structure and motion.

Project Paper Code Demo


🏆 Large Spatial Model: End-to-end Unposed Images to Semantic 3D

NeurIPS 2024 | Jian Zhang*, Zhiwen Fan*, et al.

First real-time semantic 3D reconstruction system that directly processes unposed RGB images into semantic radiance fields in a single feed-forward pass.

Paper Code Project


⚡ InstantSplat: Sparse-view Gaussian Splatting in Seconds

ArXiv 2024 | Zhiwen Fan*, Kairun Wen*, ..., Jian Zhang, et al.

Lightning-fast sparse-view 3D scene reconstruction using self-supervised framework that optimizes 3D scene representation and camera poses simultaneously.

Paper Code Project



🌟 Open for Opportunities

🎬

3D-Consistent Video Generation
Creating spatially coherent visual content

🔬

3D Spatial Understanding
Developing comprehensive 3D perception

🤝

Research Collaborations
Building the future of 3D AI together

Particularly interested in opportunities that bridge cutting-edge research with real-world applications.


📫 Contact

Email


Thanks for visiting!

Building the future of 3D AI, one breakthrough at a time

Pinned Loading

  1. NVlabs/LSM NVlabs/LSM Public

    [NeurIPS'24] Large Spatial Model: End-to-end Unposed Images to Semantic 3D

    Python 223 9

  2. VITA-Group/VLM-3R VITA-Group/VLM-3R Public

    VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction

    Python 314 23

  3. NVlabs/InstantSplat NVlabs/InstantSplat Public

    InstantSplat: Sparse-view SfM-free Gaussian Splatting in Seconds

    Python 1.6k 135