Log inSign up
Rowan Zellers
Thinking Machines
614 posts
Image
user avatar
Rowan Zellers
Thinking Machines
@rown
multimodal @thinkymachines. I also like to climb rocks and throw pottery. rowanzellers.com (he/him)
San Francisco, CA
rowanzellers.com
Joined November 2008
1,031
Following
15.3K
Followers
  • Pinned
    user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    May 11
    We are so back!
    user avatar
    Thinking Machines
    @thinkymachines
    May 11
    People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. thinkingmachines.ai/blog/interacti…
    Image
    00:00
    53K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Feb 25, 2025
    life update: I've joined @thinkymachines lab! We're building the future of human-AI interaction through open science, research+product co-iteration, and with multimodal at the core. If you're interested in joining our fantastic team - reach out! DMs open 😀
    649K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Feb 12, 2023
    I wrote a blog post on why I decided to join OpenAI instead of academia. (after I went on the academic & industry job markets, and got offers from both.) This post (pt2 in a series) took a while 😅- hoping my experience helps others make life decisions!
    Image
    Why I chose OpenAI over academia: reflections on the CS academic and industry job markets (part 2)
    From rowanzellers.com
    143K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Sep 8, 2022
    Belated announcement: after finishing my PhD, I’ve joined @OpenAI as a research scientist! I’m excited to continue working towards building multimodal, grounded AI - with a fantastic group of collaborators here.
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Nov 20, 2023
    OpenAI is nothing without its people
    30K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Jun 25, 2019
    We've released the code and model checkpoints for Grover, our 1.5B parameter neural fake news detector (and generator): github.com/rowanz/grover Excited to see what the community will find, especially regarding how to better spot machine-generated disinformation!
    Image
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Mar 16, 2022
    Introducing 🍷MERLOT Reserve, a new model that learns about vision, language, and sound all together -- by watching millions of YouTube videos. To appear at #CVPR2022. page: rowanzellers.com/merlotreserve paper: arxiv.org/abs/2201.02639 demo: merlot-reserve.apps.allenai.org 🧵/5
    Image
    GIF
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Jan 8, 2019
    Interested in computer vision/AI models that perform cognition-level visual reasoning? Today, I'm releasing @PyTorch code and pretrained models for Visual Commonsense Reasoning, available at github.com/rowanz/r2c More info (+paper) at visualcommonsense.com 😃
    Image
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Mar 2, 2025
    Impressive demo. Worth noting, it has memory -- call it twice and it remembers what was talked about in the last call. Personalization and memory are so critical for voice models to feel "human", and it feels like we're only at the beginning here.
    user avatar
    Sesame
    @sesame
    Feb 27, 2025
    At Sesame, we believe in a future where computers are lifelike. Today we are unveiling an early glimpse of our expressive voice technology, highlighting our focus on lifelike interactions and our vision for all-day wearable voice companions. sesame.com/voicedemo
    Image
    00:00
    41K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Jun 7, 2021
    Introducing MERLOT: a new model that learns about language, vision, & the world from 6M YouTube videos. Out-of-the-box, MERLOT has intrinsic notions of multimodal temporal commonsense. When finetuned, we get SOTA performance on 12 video tasks + VCR. arxiv.org/abs/2106.02636
    An overview of our model named "MERLOT" (which is depicted as a wine glass.) We train our model on 6 million youtube videos in a self-supervised way. The model can be applied for a variety of tasks including:

Video QA (example: a video clip of two people interacting, and a question "What's she holding onto before he leaves?")

Commonsense Single-image QA (example: a single image of some people in a diner and the question "why is the man pointing")

Visual story ordering (example: a story "The old man was riding on an escalator. He was almost at the top. His kids were already there) and images for each sentence.)
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    May 13, 2024
    Excited to introduce GPT-4o. Language, vision, and sound -- all together and all in real time. This thing has been so much fun to work on. It's been even more fun to play with -- with moments of magic where things feel totally fluid and I forget I'm video chatting with an AI.
    Image
    00:00
    129K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Nov 28, 2018
    Today we're announcing Visual Commonsense Reasoning, a new task and large-scale dataset, with the goal to bring AI systems from recognition- to cognition-level understanding. Joint work with @ybisk, Ali Farhadi, and @YejinChoinka. visualcommonsense.com arxiv.org/abs/1811.10830
    Image
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Jul 15, 2025
    If you’re excited to build the future of multimodal human/ai collaboration, and jam with Andrew, me, and many other talented people across the stack— dm me! 😀
    user avatar
    Andrew Hyunsoo Lee
    @alhyunsoo
    Jul 15, 2025
    life update: I joined @thinkymachines! feeling so lucky to build with such a kind, brilliant team, esp pairing with researchers early on as a designer. looking forward to sharing more soon.
    Image
    49K
  • user avatar
    Rowan Zellers
    Thinking Machines
    @rown
    Mar 14, 2023
    We've launched GPT4! Among other things -- I'm excited that it can read an image, and analyze it at a level beyond object- or scene recognition, communicating the result in helpful language. Still more followup work to be done characterizing this 😀 openai.com/research/gpt-4
    User: What is funny about this image? Describe it panel by panel.
GPT4: The image shows a package for a "Lightning Cable" adapter with three panels.

Panel 1: A smartphone with a VGA connector (a large, blue, 15-pin connector typically used for computer monitors) plugged into its charging port.

Panel 2: The package for the "Lightning Cable" adapter with a picture of a VGA connector on it.

Panel 3: A close-up of the VGA connector with a small Lightning connector (used for charging iPhones and other Apple devices) at the end.

The humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port.
    42K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement