younes (@yb2698) / X

younes

1,613 posts

younes

@yb2698

Joined July 2022

younes
@yb2698
Jul 18, 2023
Llama-2 just got released by @Meta AI and you can already use it in the @huggingface ecosystem. How to fine-tune the model on your own data? We release a simple fine-tuning script for single & multi-gpu to get you ready in few lines of code gist.github.com/younesbelkada/…
135K
younes
@yb2698
Feb 10, 2023
You asked for it. You can now fine-tune a model that has been loaded in 8-bit. With 8-bit fine-tuning each 1B parameters only cost 1 GB of GPU RAM to fine-tune, making it easy to fine-tune any large models. huggingface.co/blog/peft Colab to fine-tune OPT-6.7B in Int8 below 🧵
136K
younes
@yb2698
Nov 16, 2022
The first trillion parameter model on the Hub 🤯 Today we are proud to announce the release of the first Mixture of Experts (MoE) 🧙 models into @huggingface transformers! You can now easily, run, train and explore this fascinating architecture in the Hugging Face ecosystem! ⬇️
younes
@yb2698
May 24, 2023
A huge day for open source! 🔥 You can now load models from @huggingface in 4bit precision using load_in_4bit and bitsandbytes library, with no performance degradation. Announcement notes here: huggingface.co/blog/4bit-tran… Useful resources below
Tim Dettmers
@Tim_Dettmers
May 24, 2023
QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: arxiv.org/abs/2305.14314 Code+Demo: github.com/artidoro/qlora Samples: colab.research.google.com/drive/1kK6xasH… Colab: colab.research.google.com/drive/17XEqL1J…
142K
younes
@yb2698
Mar 9, 2023
Fine tune a 20B Language Model with RLHF using a 24GB consumer GPU? 🤯 It is now possible using TRL + PEFT! Check out the blogpost that explains how we achieve this step by step! Blogpost: huggingface.co/blog/trl-peft
115K
younes
@yb2698
Sep 22, 2023
New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!
115K
younes
@yb2698
Jan 25, 2023
Interested in applying RLHF (Reinforcement Learning with Human Feedback)? Try out trl! At @huggingface we now officially support RLHF training using PPO (Proximal Policy Optimization) Train your easily model in single, or multi-GPU setup. 🧵 github.com/lvwerra/trl
47K
younes
@yb2698
Apr 7, 2023
MatCha and DePlot from @GoogleAI ! 🧠 A set of foundation models for plots and charts that can perform complex visual reasoning tasks such as plot summarisation/VQA. When combined with instruction-tuned LMs, you can create interesting demos, such as the one below ↓
99K
younes
@yb2698
Nov 24, 2023
IPO algorithm, a new method from Google Deepmind: arxiv.org/abs/2310.12036 has been just added in Hugging Face TRL library ! Try it out now by installing TRL from source, simply pass `loss_type="ipo"` when initializing DPOTrainer: huggingface.co/docs/trl/main/…
arxiv.org
A General Theoretical Paradigm to Understand Learning from Human...
The prevalent deployment of learning from human preferences through reinforcement learning (RLHF) relies on two important approximations: the first assumes that pairwise preferences can be...
242K
younes
@yb2698
Feb 14, 2023
BLIP-2 8bit! 🧠 @salesforce has uploaded the first multi-modal chatbot on Hugging Face Hub! 🤯 BLIP2 has been released and open-sourced last week by @salesforce, run your model in 8-bit and start dialoguing with it with a few lines of code! huggingface.co/spaces/hysts/B…
56K
younes
@yb2698
Feb 14, 2024
Mixtral on a free-tier Google Colab with AQLM-2bit quantization ! 🤯 Similarly as Quip#, Aqlm quantization method makes it possible to squueze down LLMs into impressive compression format, with a peak memory of ~13GB for mixtral ! notebook:
colab.research.google.com
aqlm_transformers.ipynb
Colaboratory notebook
17K
younes
@yb2698
Mar 3, 2023
You liked Flan-T5? 🍮 You'll like Flan-UL2 - now on Hugging Face - even more! Thanks @YiTayML @Google for making the weights of the Flan-UL2 model open-source! Repo: huggingface.co/google/flan-ul2 Spaces: huggingface.co/spaces/ybelkad… Inference endpoint: huggingface.co/inference-endp… 🧵
101K
younes
@yb2698
Dec 25, 2023
Following up from the great work from community that enabled bitsandbytes 4-bit serialization, I pushed Mixtral-Instruct-bnb-4bit on @huggingface for anyone that wants to easily load the model
ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit · Hugging Face
From huggingface.co
50K
younes
@yb2698
Nov 10, 2022
Do you know that you can load @OpenAI's Whisper model in 8-bit using LLM.int8() from bitsandbytes & @TimDettmers ? How this quantization technique affects the performance of the model? @ArthurZucker ran some evaluation with 8-bit models and here are the results ⬇️