Log inSign up
younes
1,613 posts
user avatar
younes
@yb2698
Joined July 2022
322
Following
3,780
Followers
  • user avatar
    younes
    @yb2698
    Jul 18, 2023
    Llama-2 just got released by @Meta AI and you can already use it in the @huggingface ecosystem. How to fine-tune the model on your own data? We release a simple fine-tuning script for single & multi-gpu to get you ready in few lines of code gist.github.com/younesbelkada/…
    Image
    135K
  • user avatar
    younes
    @yb2698
    Feb 10, 2023
    You asked for it. You can now fine-tune a model that has been loaded in 8-bit. With 8-bit fine-tuning each 1B parameters only cost 1 GB of GPU RAM to fine-tune, making it easy to fine-tune any large models. huggingface.co/blog/peft Colab to fine-tune OPT-6.7B in Int8 below 🧵
    Image
    136K
  • user avatar
    younes
    @yb2698
    Nov 16, 2022
    The first trillion parameter model on the Hub 🤯 Today we are proud to announce the release of the first Mixture of Experts (MoE) 🧙 models into @huggingface transformers! You can now easily, run, train and explore this fascinating architecture in the Hugging Face ecosystem! ⬇️
  • user avatar
    younes
    @yb2698
    May 24, 2023
    A huge day for open source! 🔥 You can now load models from @huggingface in 4bit precision using load_in_4bit and bitsandbytes library, with no performance degradation. Announcement notes here: huggingface.co/blog/4bit-tran… Useful resources below
    Image
    Image
    user avatar
    Tim Dettmers
    @Tim_Dettmers
    May 24, 2023
    QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: arxiv.org/abs/2305.14314 Code+Demo: github.com/artidoro/qlora Samples: colab.research.google.com/drive/1kK6xasH… Colab: colab.research.google.com/drive/17XEqL1J…
    142K
  • user avatar
    younes
    @yb2698
    Mar 9, 2023
    Fine tune a 20B Language Model with RLHF using a 24GB consumer GPU? 🤯 It is now possible using TRL + PEFT! Check out the blogpost that explains how we achieve this step by step! Blogpost: huggingface.co/blog/trl-peft
    Image
    115K
  • user avatar
    younes
    @yb2698
    Sep 22, 2023
    New feature alert in the @huggingface ecosystem! Flash Attention 2 natively supported in huggingface transformers, supports training PEFT, and quantization (GPTQ, QLoRA, LLM.int8) First pip install flash attention and pass use_flash_attention_2=True when loading the model!
    Image
    115K
  • user avatar
    younes
    @yb2698
    Jan 25, 2023
    Interested in applying RLHF (Reinforcement Learning with Human Feedback)? Try out trl! At @huggingface we now officially support RLHF training using PPO (Proximal Policy Optimization) Train your easily model in single, or multi-GPU setup. 🧵 github.com/lvwerra/trl
    Image
    47K
  • user avatar
    younes
    @yb2698
    Apr 7, 2023
    MatCha and DePlot from @GoogleAI ! 🧠 A set of foundation models for plots and charts that can perform complex visual reasoning tasks such as plot summarisation/VQA. When combined with instruction-tuned LMs, you can create interesting demos, such as the one below ↓
    Image
    99K
  • user avatar
    younes
    @yb2698
    Nov 24, 2023
    IPO algorithm, a new method from Google Deepmind: arxiv.org/abs/2310.12036 has been just added in Hugging Face TRL library ! Try it out now by installing TRL from source, simply pass `loss_type="ipo"` when initializing DPOTrainer: huggingface.co/docs/trl/main/…
    arXiv logo
    arxiv.org
    A General Theoretical Paradigm to Understand Learning from Human...
    The prevalent deployment of learning from human preferences through reinforcement learning (RLHF) relies on two important approximations: the first assumes that pairwise preferences can be...
    242K
  • user avatar
    younes
    @yb2698
    Feb 14, 2023
    BLIP-2 8bit! 🧠 @salesforce has uploaded the first multi-modal chatbot on Hugging Face Hub! 🤯 BLIP2 has been released and open-sourced last week by @salesforce, run your model in 8-bit and start dialoguing with it with a few lines of code! huggingface.co/spaces/hysts/B…
    Image
    56K
  • user avatar
    younes
    @yb2698
    Feb 14, 2024
    Mixtral on a free-tier Google Colab with AQLM-2bit quantization ! 🤯 Similarly as Quip#, Aqlm quantization method makes it possible to squueze down LLMs into impressive compression format, with a peak memory of ~13GB for mixtral ! notebook:
    Colab logo
    colab.research.google.com
    aqlm_transformers.ipynb
    Colaboratory notebook
    17K
  • user avatar
    younes
    @yb2698
    Mar 3, 2023
    You liked Flan-T5? 🍮 You'll like Flan-UL2 - now on Hugging Face - even more! Thanks @YiTayML @Google for making the weights of the Flan-UL2 model open-source! Repo: huggingface.co/google/flan-ul2 Spaces: huggingface.co/spaces/ybelkad… Inference endpoint: huggingface.co/inference-endp… 🧵
    Image
    101K
  • user avatar
    younes
    @yb2698
    Dec 25, 2023
    Following up from the great work from community that enabled bitsandbytes 4-bit serialization, I pushed Mixtral-Instruct-bnb-4bit on @huggingface for anyone that wants to easily load the model
    Image
    ybelkada/Mixtral-8x7B-Instruct-v0.1-bnb-4bit · Hugging Face
    From huggingface.co
    50K
  • user avatar
    younes
    @yb2698
    Nov 10, 2022
    Do you know that you can load @OpenAI's Whisper model in 8-bit using LLM.int8() from bitsandbytes & @TimDettmers ? How this quantization technique affects the performance of the model? @ArthurZucker ran some evaluation with 8-bit models and here are the results ⬇️
    Image

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement