Jim Bohnslav (@jbohnslav) / X

Jim Bohnslav

7,157 posts

Jim Bohnslav

@jbohnslav

post-training VLAs @zoox

Boston, MA

Joined February 2011

Pinned
Jim Bohnslav
@jbohnslav
Jun 3
Realized I never posted this: I gave a talk at ReInvent this year about our multimodal-language-action model and the infra needed to train it. link 👇
1.2K
Jim Bohnslav
@jbohnslav
Apr 4, 2025
MFW I finally have to learn RL to train some reasoning models
75K
Jim Bohnslav
@jbohnslav
Jul 30, 2025
> be me, training vlms > use hf transformers because who wants to reimplement models if they don't have to > low MFU > `pytorch_profiler.py` > ViT takes 40X longer than the LLM > sus > .item() in the forward pass makes cudaStreamSynchronize every attention layer > MFW
GIF
39K
Jim Bohnslav
@jbohnslav
Sep 25, 2020
Happy to present the first paper from my PhD! biorxiv: biorxiv.org/content/10.110… code: github.com/jbohnslav/deep… Install now using `pip install deepethogram`! (1/N)
Jim Bohnslav
@jbohnslav
Apr 15, 2025
bytedance calling me GPU poor A model trained for 665,000 H100 hours is called "cost efficient", "moderate computational resources"
28K
Jim Bohnslav
@jbohnslav
Feb 18, 2025
I rarely do this, but this is an absolutely useless benchmark. The examples are not problems I care if my computer vision system can solve. I'm going to treat ZeroBench performance as an anti signal that someone made an overfit test taking VLM. I bet the next Phi will do great.
Jonathan Roberts
@JRobertsAI
Feb 17, 2025
Is computer vision “solved”? Not yet Current models score 0% on ZeroBench 🧵1/6
31K
Jim Bohnslav
@jbohnslav
Jan 2, 2025
Replying to @iScienceLuvr
I haven't found any good model to write greentexts since
10K
Jim Bohnslav
@jbohnslav
Feb 19, 2025
SimDINO: simplifying DINO/ DINOv2. They use L2 loss on global <-> local features and a penalty on feature covariance to prevent collapse. With these additions, they can get rid of a lot of the bells and whistles in DINO/v2 training while improving representations (ImageNet-KNN).
16K
Jim Bohnslav
@jbohnslav
Jan 17, 2025
DiMA: distilling a VLM into a planner for driving. They use VAD as a baseline and pass tokens from its models into an LLM. The LLM is trained to do VQA, MAE, future prediction, and scene editing. You distill the LLM into planner transformer--you don't need the LLM for inference.
13K
Jim Bohnslav
@jbohnslav
Aug 8, 2025
Replying to @natolambert
you can enable model selector in settings
19K
Jim Bohnslav
@jbohnslav
Dec 26, 2024
What would the deepseek team accomplish with a month or two on xAI's H100 cluster?
Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)
@teortaxesTex
Dec 26, 2024
> $5.5M for Sonnet tier it's unsurprising that they're proud of it, but it sure feels like they're rubbing it in. «$100M runs, huh? 30.84M H100-hours on 405B, yeah? Half-witted Western hacks, your silicon is wasted on you, your thoughts wouldn't reduce loss of your own models»
96K
Jim Bohnslav
@jbohnslav
Sep 3, 2021
Super excited that DeepEthogram is published officially today in @eLife!! elifesciences.org/articles/63377 (1/N)
elifesciences.org
DeepEthogram, a machine learning pipeline for supervised behavior classification from raw pixels
DeepEthogram automatically classifies animal behavior videos into researcher-defined behaviors of interest, saving researcher time and enabling more detailed downstream analysis of behavior.
Jim Bohnslav
@jbohnslav
Oct 4, 2019
Replying to @neeratanden
It's definitely not moral compunction. If they don't help Trump, it's because they think it's not in their interest, not against their morals
Jim Bohnslav
@jbohnslav
Jul 2, 2025
Replying to @dejavucoder
My man must be a leetcode machine
16K