Log inSign up
muhtasham
2,254 posts
user avatar
muhtasham
@Muhtasham9
burning tokens like it's 1999
Latent Space
muhtasham.github.io/blog/
Joined March 2020
998
Following
1,792
Followers
  • Pinned
    user avatar
    muhtasham
    @Muhtasham9
    Jul 4, 2023
    w boss
    Image
    12K
  • user avatar
    muhtasham
    @Muhtasham9
    Mar 26, 2024
    A short thread about changes in the transformer architecture since 2017. Reading articles about LLMs, you can see phrases like “we use a standard transformer architecture.” But what does "standard" mean, and have there been changes since the original article? (1/6)
    Image
    user avatar
    muhtasham
    @Muhtasham9
    Aug 14, 2022
    Interestingly despite the 5 years(!) of hyper-growth of NLP space, Vanilla Transformer is holding to the Lindy Effects which is the idea that the older something is, the longer it's likely to be around in the future.
    135K
  • user avatar
    muhtasham
    @Muhtasham9
    Jan 15, 2024
    Evaluating abstractive summarization remains an open area for further improvement. If you ever dealt with large-scale summarisation evaluation you know how tedious it is. Inspired by @eugeneyan's post on this topic, I hacked something together over the weekend to streamline this
    Image
    43K
  • user avatar
    muhtasham
    @Muhtasham9
    Feb 4, 2023
    Excited to announce the most up-to-date and CPU friendly BERT, trained on most recent snapshot of internet. Took a day and 8x A100s to train. 🤗 The model is open-source an I hope the community can benefit from it. It was created…lnkd.in/edQhXf3q lnkd.in/eM6nW38a
    35K
  • user avatar
    muhtasham
    @Muhtasham9
    Oct 23, 2022
    Meta: Multi-tasking while reading about Multi-task NLP models
    Image
  • user avatar
    muhtasham
    @Muhtasham9
    Mar 3, 2024
    StarCoder2 running on M2 8GB
    Image
    00:00
    9.8K
  • user avatar
    muhtasham
    @Muhtasham9
    Mar 12, 2024
    DeepMind folks can now steal weights behind APIs “We also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix.” who wants to do same for gpt4? arxiv.org/abs/2403.06634
    6.9K
  • user avatar
    muhtasham
    @Muhtasham9
    May 25, 2022
    Replying to @_jasonwei and @arankomatsuzaki
    Might contain a lot of subtle issues, see clever Hans effect, which is always hard to debug. The law of leaky abstractions in action as my supervisor says
    Image
    NLP's Clever Hans Moment has Arrived
    From thegradient.pub
  • user avatar
    muhtasham
    @Muhtasham9
    May 15, 2023
    Replying to @Mascobot
    user avatar
    muhtasham
    @Muhtasham9
    May 15, 2023
    🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
    18K
  • user avatar
    muhtasham
    @Muhtasham9
    May 15, 2023
    🇺🇸US: Innovate then try to regulate 🇪🇺EU: Regulate then try to innovate
    21K
  • user avatar
    muhtasham
    @Muhtasham9
    Mar 3, 2024
    The 🤗 MLX community is amazing Quantized StarCoder2 model variants available here: huggingface.co/mlx-community Small guide on running and training StarCoder2 locally pip install -U mlx-lm To run inference on quantized model python -m mlx_lm.generate --model
    user avatar
    BigCode
    @BigCodeProject
    Feb 28, 2024
    Introducing: StarCoder2 and The Stack v2 ⭐️ StarCoder2 is trained with a 16k token context and repo-level information for 4T+ tokens. All built on The Stack v2 - the largest code dataset with 900B+ tokens. All code, data and models are fully open! hf.co/bigcode/starco…
    Image
    Image
    mlx-community (MLX Community)
    From huggingface.co
    7.9K
  • user avatar
    muhtasham
    @Muhtasham9
    Feb 7, 2024
    Happy to show Pod-Helper: ⚡️ Lightning-speed transcription with Whisper 🔧 Built-in audio repair with good old Roberta 🧊 Checks your content's vibe effortlessly See demo below running on TensorRT-LLM #GenAIonRTX #DevContest #GTC24 @NVIDIAAIDev
    Image
    00:00
    15K
  • user avatar
    muhtasham
    @Muhtasham9
    Nov 3, 2022
    Replying to @tszzl
    amasad.me/red Here is PDF by @amasad
  • user avatar
    muhtasham
    @Muhtasham9
    Apr 30, 2023
    If you missed out on the @full_stack_dl LLM bootcamp, don't worry! I've written a blog post about it. I hope you find my post informative and enjoyable to read, just as I enjoyed attending the bootcamp.
    muhtasham.github.io
    A Deep Dive into the LLM Bootcamp Experience: Revolutionizing AI-Powered Applications – Koding...
    3.3K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement