Log inSign up
Tom Jobbins
336 posts
user avatar
Tom Jobbins
@TheBlokeAI
My Hugging Face repos: huggingface.co/TheBloke Discord server: discord.gg/theblokeai Patreon: patreon.com/TheBlokeAI
UK
patreon.com/TheBlokeAI
Joined July 2010
224
Following
15.5K
Followers
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jun 12, 2023
    New PR in at llama.cpp for full CUDA GPU acceleration! github.com/ggerganov/llam… This is huge! For the first time GGML is beating GPTQ speed. On a 4090 + i9-13900K I'm getting 109.29 tokens/s on 7B and 29.11 tokens/s on 30B. AutoGPTQ is: 98 t/s for 7B, 35 t/s for 30B.
    Image
    CUDA full GPU acceleration, KV cache in VRAM by JohannesGaessler · Pull Request #1827 · ggml-org/...
    From github.com
    166K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jun 14, 2023
    New StarCoder coding model from @WizardLM_AI "WizardCoder-15B-v1.0 model achieves 57.3 pass@1 on the HumanEval Benchmarks .. 22.3 points higher than the SOTA open-source Code LLMs." My quants: huggingface.co/TheBloke/Wizar… huggingface.co/TheBloke/Wizar… Original: huggingface.co/WizardLM/Wizar…
    Image
    TheBloke/WizardCoder-15B-1.0-GGML · Hugging Face
    From huggingface.co
    393K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Aug 24, 2023
    Meta's CodeLlama is here! ai.meta.com/blog/code-llam… 7B, 7B-Instruct, 7B-Python, 13B, 13B-Instruct, 13B-Python, 34B, 34B-Instruct, 34B-Python First time we've seen the 34B model I've got a couple of fp16s up: huggingface.co/TheBloke/CodeL… huggingface.co/TheBloke/CodeL… More coming soon obvs
    Image
    ai.meta.com
    Introducing Code Llama, a state-of-the-art large language model for coding
    Code Llama, which is built on top of Llama 2, is free for research and commercial use.
    25K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jul 23, 2023
    Llama 2 70B GGML support is here! Use this llama.cpp release: github.com/ggerganov/llam… My first repo is at: huggingface.co/TheBloke/Llama… Note: at this time it's only possible to convert the base Llama 2 models, not any fine tunes. This is being worked on.
    Image
    Release master-e76d630 · ggml-org/llama.cpp
    From github.com
    57K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jul 18, 2023
    Oh my, LLaMA 2! 7B, 13B, 70B, 2T tokens, 4K context, commercial license! huggingface.co/meta-llama But why, Meta, why no 33B or similar size? You missed out the sweet spot? :( Unless with 2T tokens and 4K context, 13B proves more than good enough.. could be!
    Image
    meta-llama (Meta Llama)
    From huggingface.co
    53K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    May 25, 2023
    I've uploaded merged/quantised versions of all of @Tim_Dettmers ' Guanaco models: huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… huggingface.co/TheBloke/guana… phew!
    Image
    TheBloke/guanaco-7B-GPTQ · Hugging Face
    From huggingface.co
    34K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Aug 23, 2023
    Transformers 4.32.0 now supports GPTQ models natively! Over the last couple of days I have updated 296 of my GPTQ repos to provide automatic support for this. It's awesome you can now load a GPTQ model directly in Transformers with only two lines of code!
    user avatar
    Marc Sun
    @_marcsun
    Aug 23, 2023
    LLMs just got faster and lighter with 🤗 Transformers x AutoGPTQ ! You can now load your models from @huggingface with GPTQ quantization. Enjoy faster inference speed and lower memory usage than existing supported quantization schemes 🚀 Blogpost: huggingface.co/blog/gptq-inte…
    37K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jul 6, 2023
    I've just quantised my largest ever models! BLOOMZ 176B and BLOOMChat 176B! huggingface.co/TheBloke/bloom… huggingface.co/TheBloke/BLOOM… Took a month before I found a system big enough. But thanks to @latitudesh and their beast 4xH100 80GB, EPYC 9354 750GB, I did each model in <4 hours! 🚀
    Image
    TheBloke/bloomz-176B-GPTQ · Hugging Face
    From huggingface.co
    38K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Sep 24, 2023
    Thanks again to @latitudesh for the loan of a beast 8xH100 server this week. I uploaded over 550 new repos, maybe my busiest week yet! Quanting is really resource intensive. Needs not only fast GPUs, but many CPUs, lots of disk, and 🚀 network. A server that ✅ all is v. rare!
    32K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jul 7, 2023
    The other day I discovered a little environment variable buried in the @huggingface Hub Python docs: 𝙷𝙵_𝙷𝚄𝙱_𝙴𝙽𝙰𝙱𝙻𝙴_𝙷𝙵_𝚃𝚁𝙰𝙽𝚂𝙵𝙴𝚁 It has changed my life! Docs say 2x faster, but in my testing it's 3-5x faster 🚀😍 (and it's just as fast for uploads!)
    Image
    Image
    82K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jul 11, 2023
    I have reached quantisation nirvana.. making 9 GPTQs at once! This @latitudesh server is a monster, and it is always hungry! 👹
    Image
    GIF
    19K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    Jun 11, 2023
    New models from Allen AI Tulu 30B, 13B, 7B LLaMa models tuned on a mix of datasets eg FLAN V2, CoT, Dolly, OAST, GPT4-Alpaca, ShareGPT huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-… huggingface.co/TheBloke/tulu-…
    Image
    TheBloke/tulu-30B-GPTQ · Hugging Face
    From huggingface.co
    40K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    May 27, 2023
    New WizardLM model, now in 13B! Trained on 250k 'evolved instructions' from ShareGPT and recorded as matching or beating GPT4 on multiple benchmarks (not all, of course :) ) I've merged and quantised here: huggingface.co/TheBloke/wizar… huggingface.co/TheBloke/wizar… huggingface.co/TheBloke/wizar…
    Image
    TheBloke/WizardLM-13B-1.0-GGML · Hugging Face
    From huggingface.co
    32K
  • user avatar
    Tom Jobbins
    @TheBlokeAI
    May 28, 2023
    An interesting new special model! Gorilla enables LLMs to use tools by invoking APIs. Project website: shishirpatil.github.io/gorilla/ My uploads: huggingface.co/TheBloke/goril… huggingface.co/TheBloke/goril… huggingface.co/TheBloke/goril…
    34K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement