Wonmin Byeon (@wonmin

Wonmin Byeon

98 posts

Wonmin Byeon

@wonmin_byeon

Researcher

California

wonmin-byeon.github.io

Joined March 2020

Pinned
Wonmin Byeon
@wonmin_byeon
Mar 4
🚀 New paper: Mamba–Transformer hybrid VLMs can go fast without forgetting. We introduce stateful token reduction for long-video VLMs. ✅ Only 25% of visual tokens 🚀 3.8–4.2× faster prefilling (TTFT) 🎯 Near-baseline accuracy (can exceed baseline with light finetuning)
14K
Wonmin Byeon
@wonmin_byeon
Jan 16, 2025
We are hiring a PhD intern at NVIDIA research. If you are interested, email me your CV. Research topics: SSM/Mamba, other alternative architectures for LLMs/multimodal LLMs, or more interesting topic than these :)
87K
Wonmin Byeon
@wonmin_byeon
Mar 10, 2023
Check out our #CVPR2023 paper!
AK
@_akhaliq
Mar 9, 2023
Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models with COCO training only, our method achieves 23.4 PQ and 30.0 mIoU on the ADE20K dataset, with 8.3 PQ and 7.9 mIoU absolute improvement over the previous state-of-the-art abs: jerryxu.net/ODISE/
00:00
19K
Wonmin Byeon
@wonmin_byeon
Jun 13, 2024
Here is our new 8B Mamba-based Hybrid LLM: Higher MMLU compared to the 8B transformer and long context extension up to 128K sequences.
Bryan Catanzaro
@ctnzr
Jun 13, 2024
A 8B-3.5T hybrid SSM model gets better accuracy than an 8B-3.5T transformer trained on the same dataset: * 7% attention, the rest is Mamba2 * MMLU jumps from 50 to 53.6% * Training efficiency is the same * Inference cost is much less arxiv.org/pdf/2406.07887
arxiv.org
An Empirical Study of Mamba-based Language Models
Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory...
9.3K
Wonmin Byeon
@wonmin_byeon
Jan 16, 2025
Replying to @wonmin_byeon
email: [email protected] Feel free to DM me if you have questions.
5.2K
Wonmin Byeon
@wonmin_byeon
Dec 11, 2023
📢 Excited to share our work at #NeurIPS2023: ConvSSM, a powerful sequence model for long videos. poster: Tuesday at 5:15pm, Great Hall & Hall B1+B2 #705 arxiv.org/abs/2310.19694 github.com/nvlabs/convssm (coming soon) Work done with @jimmysmith1919 @shalinidemello @jankautz 🧵👇
7.5K
Wonmin Byeon
@wonmin_byeon
Jul 17, 2024
I will give a talk at KAIST today (July 17th) at 5pm PDT. The talk is about Mamba-based models and the findings from our recent paper. arxiv.org/abs/2406.07887 Everyone is welcome to join! The Zoom link is below.
Alice Oh
@aliceoh
Jul 9, 2024
Excited to host a Zoom talk by Dr. Wonmin Byeon on her research with Nvidia colleagues on "An Alternative Architecture for Efficient Large Language Models (LLMs)" This will be on Zoom, July 17th 5 pm PDT (July 18th 9 am KST), kaist.zoom.us/my/aliceatkaist Abstract: Widely used Large
arxiv.org
An Empirical Study of Mamba-based Language Models
Selective state-space models (SSMs) like Mamba overcome some of the shortcomings of Transformers, such as quadratic computational complexity with sequence length and large inference-time memory...
2.3K
Wonmin Byeon
@wonmin_byeon
Feb 14, 2024
ConvSSM: State Space Models for long videos 🎉 We finally released the code and the pretrained models. Code: github.com/NVlabs/ConvSSM Paper: arxiv.org/abs/2310.19694 @NVIDIAAI @jimmysmith1919
Wonmin Byeon
@wonmin_byeon
Dec 11, 2023
📢 Excited to share our work at #NeurIPS2023: ConvSSM, a powerful sequence model for long videos. poster: Tuesday at 5:15pm, Great Hall & Hall B1+B2 #705 arxiv.org/abs/2310.19694 github.com/nvlabs/convssm (coming soon) Work done with @jimmysmith1919 @shalinidemello @jankautz 🧵👇
GitHub - NVlabs/ConvSSM
From github.com
2K
Wonmin Byeon
@wonmin_byeon
Jul 19, 2022
Check out our spotlight & poster at #ICML2022 tomorrow! Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework arxiv.org/abs/2106.09121 Wed 20 July 10:45-10:50am EDT, Ballroom 1&2 (Spotlight) 6:30-8:30pm EDT, Hall E #229 (Poster) w/ Jiahao Su, @furongh
Wonmin Byeon
@wonmin_byeon
Nov 24, 2024
Our new hybrid model is out! Our Hymba-1.5B even outperforms LLaMA 3.2-3B. Check out the paper for more details.
Pavlo Molchanov
@PavloMolchanov
Nov 22, 2024
Sharing our team’s latest work on Hymba - an efficient small language model with hybrid architecture. Tech report: arxiv.org/abs/2411.13676 Discover the tradeoff between Mamba and Attention, how they can be combined, how attention sink and forced-to-attend phenomena can be
1.2K
Wonmin Byeon
@wonmin_byeon
Jan 17, 2025
Replying to @wonmin_byeon
Sorry. My DM is open now.
1.6K
Wonmin Byeon
@wonmin_byeon
Jan 4, 2024
@NVIDIAAI is hiring interns! If you are interested in topics related to SSMs, RNNs, and Transformers for spatiotemporal modeling, send me an email with your CV. Check out our recent #NeurIPS2023 paper about Convolutional State Space Models (ConvSSM) 👇
Wonmin Byeon
@wonmin_byeon
Dec 11, 2023
📢 Excited to share our work at #NeurIPS2023: ConvSSM, a powerful sequence model for long videos. poster: Tuesday at 5:15pm, Great Hall & Hall B1+B2 #705 arxiv.org/abs/2310.19694 github.com/nvlabs/convssm (coming soon) Work done with @jimmysmith1919 @shalinidemello @jankautz 🧵👇
860
Wonmin Byeon
@wonmin_byeon
Aug 31, 2020
Check out the full videos of our tutorial: youtube.com/playlist?list=… Program: nvlabs.github.io/eccv2020-mixed…
Andrej Karpathy
@karpathy
Aug 30, 2020
good quick tutorial on optimizing your PyTorch code ⏲️: youtube.com/watch?v=9mS1fI… quick summary:
Wonmin Byeon
@wonmin_byeon
Mar 28, 2022
Try out our segmentation demo!
Jiarui Xu
@Jerry_XU_Jiarui
Mar 27, 2022
📢The code is released! github.com/NVlabs/GroupViT 🔥Try out the demo for GroupViT on @huggingface Spaces with @Gradio huggingface.co/spaces/xvjiaru…