𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹.
We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction.
LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with
Excited to share MonST3R! -- a simple way to estimate geometry from unposed video of dynamic scene
We achieve competitive results on several downstreams (video depth, camera pose) and believe this is a promising step toward feed-forward 4D reconstruction
monst3r-project.github.io
MonST3R is accepted by ICLR'25 as Spotlight!
We have also added a fully feed-forward reconstruction mode that runs in real-time for video input (samples at: monst3r-paper.github.io/page0.html), check more details here: github.com/Junyi42/monst3…
Introducing St4RTrack!🖖
Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps!
st4rtrack.github.io
Excited to share MonST3R! -- a simple way to estimate geometry from unposed video of dynamic scene
We achieve competitive results on several downstreams (video depth, camera pose) and believe this is a promising step toward feed-forward 4D reconstruction
monst3r-project.github.io
One of our goals is to have Optimus learn straight from internet videos of humans doing tasks. Those are often 3rd person views captured by random cameras etc. We recently had a significant breakthrough along that journey, and can now transfer a big chunk of the learning
Humanoids need to perceive the environment in the real world
Using 4D reconstruction techniques, we turn casual human videos into training data for an environment-aware humanoid policy
Super excited to share: VideoMimic.net
our new system trains humanoid robots using data from cell phone videos, enabling skills such as climbing stairs and sitting on chairs in a single policy
(w/ @redstone_hong@junyi42@davidrmcall)
Just arrived at Nashville for #CVPR25! 🥰
I'll present St4RTrack tomorrow morning (10:30–12:30) at the 4D Vision Workshop, poster #137 in Hall 104 B.
Feel free to come and chat!
Introducing St4RTrack!🖖
Simultaneous 4D Reconstruction and Tracking in the world coordinate feed-forwardly, just by changing the meaning of two pointmaps!
st4rtrack.github.io
🚀Introducing “Telling Left from Right” at #CVPR2024
-🔍Identify the problem 𝐠𝐞𝐨metry-𝐚𝐰𝐚𝐫𝐞 semantic correspondence (SC)
-📐Evaluate foundation model features’ geometric awareness
-🏆Achieve SOTA with a lightweight post-processor
🔗 (w/ code!): telling-left-from-right.github.io
On my way to Seattle ✈️ for my first ever #CVPR! Excited to meet old and new friends. 😄
I'll be presenting our work telling-left-from-right.github.io on Wed. (19th) morning at #284. If you're interested in how a plug-in processor can enhance the Geo-aware SC of SD+DINO, please stop by.
MonST3R is accepted by ICLR'25 as Spotlight!
We have also added a fully feed-forward reconstruction mode that runs in real-time for video input (samples at: monst3r-paper.github.io/page0.html), check more details here: github.com/Junyi42/monst3…
The results are so cool!
4D reconstruction is a very challenging task - I tried to explore it before MonST3R but couldn't make it work. I'm thrilled to see MonST3R contributing a part to this reconstruction pipeline!
🚀 Introducing CAT4D! 🚀
CAT4D transforms any real or generated video into dynamic 3D scenes with a multi-view video diffusion model.
The outputs are dynamic 3D models that we can freeze and look at from novel viewpoints, in real-time! Be sure to try our interactive viewer!
Hard to see the details in the figure? Check it out for yourself 😍: monst3r-project.github.io/page1.html
We’ve created an interesting 4D online demo that you can easily explore!
Nice work! Very cool results by carefully-designed generative inpainting on MonST3R's partial pointmaps. Glad to see MonST3R/dynamic 3d reconstruction is playing an important role.