Benjamin Warner (@benjamin

Benjamin Warner

656 posts

Benjamin Warner

@benjamin_warner

Research @SophontAI. Previously answerdotai. Vaccines save lives.

benjaminwarner.dev

Joined September 2011

Pinned
Benjamin Warner
@benjamin_warner
Dec 19, 2024
Today we released ModernBERT, the first encoder to reach SOTA on most common benchmarks across language understanding, retrieval, and code, while running twice as fast as DeBERTaV3 on short context and three times faster than NomicBERT & GTE on long context.
12K
Benjamin Warner
@benjamin_warner
Jul 31, 2023
I've written up my study group lectures on implementing Transformers in PyTorch into a blog series: Creating Transformers from Scratch: - Part 1: The Attention Mechanism benjaminwarner.dev/2023/07/01/att… - Part 2: The Rest of the Transformer benjaminwarner.dev/2023/07/28/res…
185K
Benjamin Warner
@benjamin_warner
Dec 8, 2021
I published a post comparing the newly launched SageMaker Studio Lab with Kaggle, Google Colab, and Colab Pro. SageMaker is a strong contender for those starting out in deep learning and almost a straight upgrade from the free version of Colab. 1/3 benjaminwarner.dev/2021/12/08/tes…
Benjamin Warner
@benjamin_warner
Nov 17, 2022
PyTorch's fused foreach methods, used in their multi-tensor optimizers, are surprisingly undocumented given how useful and fast they are. Rewrote my EMACallback and the EMA step is now significantly faster on tested models vs a standard for loop over both sets of weights.
Benjamin Warner
@benjamin_warner
Mar 14, 2024
We published our technical deep dive into FSDP+QLoRA, a technique which enables training 70B models on dual consumer GPUs. We cover how to add FSDP support to quantization libraries and how to add QLoRA support to FSDP training frameworks.
answer.ai
Enabling 70B Finetuning on Consumer GPUs – Answer.AI
A detailed guide for adding FSDP and QLoRA support to quantization libraries and training frameworks.
28K
Benjamin Warner
@benjamin_warner
Oct 28, 2021
Inspired by PyTorch Lightning's SimpleProfiler, I created a Simple Profiler Callback for fast.ai (technically two). Simple to use:
Benjamin Warner
@benjamin_warner
Jun 14, 2022
Over the past week, @capetorch and I discovered, debugged, and created a workaround for a performance bug in PyTorch which reduces image training GPU throughput up to forty percent. The culprit? Subclassed tensors. benjaminwarner.dev/2022/06/14/deb… 1/6
benjaminwarner.dev
Discovering and Debugging a PyTorch Performance Decrease - Subclassed Tensors Reduce GPU Throughput...
Over the past week, Thomas Capelle and I discovered, debugged, and created a workaround for a performance bug in PyTorch which reduced image training GPU throughput up to forty percent when using...
Benjamin Warner
@benjamin_warner
Jun 6, 2022
Over the weekend I released fastxtend to pypi and wrote a post highlighting some of its features. Fastxtend is a collection of tools, extensions, and addons for @fastdotai.
benjaminwarner.dev
Introducing fastxtend - A Collection of Tools, Extensions, & Addons for fastai
Fastxtend is a collection of tools, extensions, and addons for fastai. In this post, I highlight some of fastxtend’s current best features.
Benjamin Warner
@benjamin_warner
Apr 18, 2024
If finetuning Llama 3 w/ Hugging Face, use Transformers 4.37 or 4.40. Llama & Gemma in 4.38 & 4.39 don't use PyTorch's Flash Attention 2 kernel, leading to high memory usage. 4.40 uses FA2 in eager mode, but not with torch.compile. I'm working with HF to fully fix this.
13K
Benjamin Warner
@benjamin_warner
Aug 2, 2022
The latest fastxtend release brings automatic Progressive Resizing to @fastdotai. Inspired by @MosaicML's implementation for Composer, which in turn was inspired by the fastai course, Progressive Resizing can speed up training with a small decrease in model performance.
Benjamin Warner
@benjamin_warner
Oct 31, 2022
The latest release of fastxtend brings fused optimizers to .@fastdotai. The fused versions are drop-in replacements for fastai native optimizers, and step 21 to 293 percent faster on modern hardware.
Benjamin Warner
@benjamin_warner
Mar 8, 2024
You can now finetune 70B models on two 24GB GPUs using FSDP+QLoRA. Joint work with @johnowhitaker & @keremturgutlu at Answer.ai Our example code is here: github.com/AnswerDotAI/fs… We plan on publishing a technical walkthrough early next week.
Jeremy Howard
@jeremyphoward
Mar 7, 2024
Today, with @Tim_Dettmers, @huggingface, & @Mobius_Labs, we're releasing FSDP/QLoRA, a new project that lets you efficiently train very large (70b) models on a home computer with consumer gaming GPUs. 1/🧵 answer.ai/posts/2024-03-…
3.6K
Benjamin Warner
@benjamin_warner
Mar 8, 2024
If you interested in integrating FSDP+QLoRA into your training framework, feel free to reach out. We plan on publishing a technical walkthrough early next week.
Hamel Husain
@HamelHusain
Mar 8, 2024
FDSP + QLoRA from is now merged into @axolotl_ai 🔥 GPU Poor -> GPU Rich💰 @winglian fast on the integration per usual 🏇 w/ @johnowhitaker for behind the scenes support This is how to use it: 1. Upgrade axolotl per the README 2. Set `adapter: qlora` in the config 3. fdsp
12K
Benjamin Warner
@benjamin_warner
Jan 18, 2024
I'm excited to share that I joined answer.ai. I'm working with a great founding team on AI Research & Development, Open-Source, and collaborations with AI Researchers and Developers.
14K