Life update: I'll be joining Berkeley EECS as a PhD student starting in fall 2025, playing around with multimodal models and llms, being part of Sky Lab & BAIR, and enjoying the unreal™️ weather 🏖️ CA has to offer!
Zirui "Colin" Wang
189 posts
Research Intern @MetaAI; CS PhD Student @Berkeley_AI and @BerkeleySky; prev @Princeton_NLP, @HDSIUCSD, @VoioInc multimodal interaction
- While DeepSeek R1 has been flexing 💪🏻, how are VLMs progressing in 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠? ⚠️ Major Shift: the latest 𝐨𝐩𝐞𝐧-𝐰𝐞𝐢𝐠𝐡𝐭 Qwen2.5-VL has beaten the first GPT-4o and is now on par with the latest ChatGPT-4o! 😲 But what about o1-like models? Can they enhance
- 🤨 Are Multimodal Large Language Models really as 𝐠𝐨𝐨𝐝 at 𝐜𝐡𝐚𝐫𝐭 𝐮𝐧𝐝𝐞𝐫𝐬𝐭𝐚𝐧𝐝𝐢𝐧𝐠 as existing benchmarks such as ChartQA suggest? 🚫 Our ℂ𝕙𝕒𝕣𝕏𝕚𝕧 benchmark suggests NO! 🥇Humans achieve ✨𝟖𝟎+% correctness. 🥈Sonnet 3.5 outperforms GPT-4o by 10+ points,
00:00 - 🤖 Welcome 𝐆𝐏𝐓-𝟒𝐨 𝐌𝐢𝐧𝐢 and 𝐈𝐧𝐭𝐞𝐫𝐧𝐕𝐋𝟐 𝐋𝐋𝐚𝐌𝐀-𝟑 𝟕𝟔𝐁 to the CharXiv (charxiv.github.io) leaderboard for chart understanding! As concurrently released models, GPT-4o Mini is 𝐛𝐞𝐚𝐭𝐞𝐧 𝐛𝐲 𝐭𝐡𝐞 𝐨𝐩𝐞𝐧-𝐰𝐞𝐢𝐠𝐡𝐭 𝐨𝐧𝐞. 🎊 Congratulations to
- We present 🧩 TokenCompose, a text-to-image latent diffusion model trained with fine-grained grounding objectives for enhanced compositionality and photorealism. 🌐 Website: mlpc-ucsd.github.io/TokenCompose/ 📃 Paper: huggingface.co/papers/2312.03… 🖥️ Code: github.com/mlpc-ucsd/Toke… 🧵[1/n]
00:00 - 🎉Exciting news in Multimodal LLMs! We're excited to see that 𝐈𝐧𝐭𝐞𝐫𝐧𝐕𝐋 𝐂𝐡𝐚𝐭 𝐕𝟐.𝟎 and 𝐂𝐚𝐦𝐛𝐫𝐢𝐚𝐧 now lead the 𝐂𝐡𝐚𝐫𝐗𝐢𝐯 leaderboard (charxiv.github.io) in chart understanding for open-weight models. 🤔What leads to their success? Here's some of
- 🚨 I'll be presenting CharXiv this Friday morning at #neurips and Sunday at the MAR workshop. I'm 🤗 to connect with new friends and chat about developing/enhancing multimodal models (text-to-image, VLMs, etc) and their evaluations! Let's meet up at the conference :)
- Just finished response to authors' rebuttal for all papers that had a rebuttal in my batch. I hope these in-time responses give people more time/rounds for healthy and meaningful discussions on their papers! 👀 #NeurIPS
- i've been working on my masters' thesis and finally got something worth mentioning for the broader impact of the research work i did last year -- it's not another benchmark but an eval that people and devs care about and i'm ready to build more of them :p
- I'm honored to join the Siebel Scholars '25 cohort and committed to make AI systems empirically usable and useful in future years to come :).Congrats to @chochijimat, @danfriedman0, @sunniesuhyoung, @SadhikaMalladi and @zwcolin on being named 2025 @SiebelScholars! Now in its 24th year, the Siebel Scholars Program awards fellowships to students based on academic achievement and leadership. bit.ly/3ZLtrv4
- Replying to @IEEESpectrum @_akhaliq and @arankomatsuzakiThe original article claimed a *correlation*: "the correlation between influencer tweets and citation count," but your tweet statement made it sound like a *causation*. This is not rigorous at all. One can also say that influencers are forward-thinking in selecting papers.
- Today we are excited to share that 🧩 TokenCompose has been accepted to #CVPR2024. See you soon in Seattle!We present 🧩 TokenCompose, a text-to-image latent diffusion model trained with fine-grained grounding objectives for enhanced compositionality and photorealism. 🌐 Website: mlpc-ucsd.github.io/TokenCompose/ 📃 Paper: huggingface.co/papers/2312.03… 🖥️ Code: github.com/mlpc-ucsd/Toke… 🧵[1/n]
00:00 - Our paper (and my first RL paper ever): "On the Feasibility of Cross-Task Transfer with Model-Based Reinforcement Learning" got accepted @iclr_conf! 🧵(1/3)
- I'll present CharXiv at tmr's Multimodal Algorithmic Reasoning workshop for a spotlight talk at 11:45am followed by a poster session at 2:15pm in West Building Exhibit Hall A. If you are interested in or working on developing/evaluating multimodal models, let's connect there!












