depth from unconstrained video (unknown camera parameters). the results look really impressive!
arxiv.org/pdf/1904.04998…
Tejas Kulkarni
2,470 posts
Scientist @GoogleDeepMind. ex CEO @CSM_ai. Interested in AGI, Brain and AI creativity. PhD @mitbrainandcog
- Special thanks to @GoogleDeepMind for inviting me to try out Genie 3. I'm excited to share my thoughts on this early research prototype and also some of my live recordings below: I spent the whole day playing with the system and when it works, it is truly mind blowing🤯. It is
00:00 - #BostonProtests This is the view from our apartment. Never seen anything like this. The crowd size is unbelievable (20 minutes of real time - instagram.com/tv/CA33u4sABPk…)
00:00 - Our work shows that adding geometric inductive biases in neural nets enables spatio-temporally consistent (hundreds of steps) object keypoints. This enables agents that play Atari games on a single machine with less than 100k steps + deeply explore hard envs without rewards.Deep RL agents are data hungry and often learn task-specific representations. Our model learns object-centric abstractions from raw videos. This enables highly data-efficient RL and structured exploration. arxiv.org/abs/1906.11883
00:00 - The rumors are true - AI energy in the valley is on fire. There is a deep understanding and excitement around building products with AI as one of the core moats. Before it always felt like people used AI-first as a cliche to bloat startups but now there is a genuine realization
- dall-e2 from @OpenAI is not just an interesting AI tool. IMO its the most compelling demonstration of compositionality due to its multimodal nature. While we need a lot more for AGI (videos,3d,abstractions, behaviors etc), this is the first demo that pulls down my timeline of AGI
- CNNs are all you need -> attention is all you need -> MLPs are all you need -> compute is all you need -> $ is all you needNew paper from Brain Zurich and Berlin! We try a conv and attention free vision architecture: MLP-Mixer (arxiv.org/abs/2105.01601) Simple is good, so we went as minimalist as possible (just MLPs!) to see whether modern training methods & data is sufficient...
- We have released the code for transporter — a neural network architecture for unsupervised learning of object keypoints (now a NeurIPS paper): github.com/deepmind/deepm…Deep RL agents are data hungry and often learn task-specific representations. Our model learns object-centric abstractions from raw videos. This enables highly data-efficient RL and structured exploration. arxiv.org/abs/1906.11883
00:00 - To test the limits of AI-generated code and 3D assets, I vibe-coded this 3D multi-player FPS game over the weekend. It is scary how well it worked. Here are some key insights (with blog and game link in thread): - Automated play testing is the biggest open problem and SIMA from
00:00 - Just moved to Boston after spending a few exciting and inspiring years at DeepMind. Excited for the next big adventure.
- Replying to @ylecunI see where you are coming from. But technological advancements can have many dimensions - a novel scientific hypothesis and validation of it is one. Before chatgpt, I didn’t see my friends or family mention llm. The technical advance here over other llms was human alignment + UX
- Image to 3D: 3d.csm.ai/demo/modern-cl… We are solving this problem at an unprecedented pace. This is just the beginning and we won't stop until we get to human-level performance to disrupt the market. Join us if you want to help create the next breakthrough:
00:00 - delusions at an all time high in AI






