Log inSign up
Aston Zhang
208 posts
Image
user avatar
Aston Zhang
@astonzhangAZ
Pre-training @OpenAI
San Francisco
astonzhang.com
Joined December 2018
97
Following
9,393
Followers
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Jul 23, 2024
    Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper: 🔹Llama 3.1 405B, continuously trained with a 128K context
    Image
    423K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Apr 18, 2024
    Llama 3 has been my focus since joining the Llama team last summer. Together, we've been tackling challenges across pre-training and human data, pre-training scaling, long context, post-training, and evaluations. It's been a rigorous yet thrilling journey: 🔹Our largest models
    Image
    Image
    394K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Dec 15, 2022
    Our deep learning book goes 1.0 beta🎉 📕Forthcoming on Cambridge Univ Press (w @PyTorch) 🆕JAX implementation 🆕Reinforcement Learning 🆕Gaussian Processes 🆕Hyperparameter Optimization - Thank 500 contributors & 400 univs for choosing! - Free at D2L.ai (1/5)
    Image
    Image
    Image
    Image
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Feb 3, 2023
    "Imagine learning a textbook with no figures." Multimodal chain-of-thought (Multimodal-CoT) in Language Models - Outperform GPT-3.5 by 16% (75%->91%) and surpass human performance on ScienceQA - Less than 1B params (so you can train more easily) - Code & model released [1/6]
    Image
    98K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Nov 30, 2023
    Thrilled that our 'Dive into Deep Learning' book, now published by Cambridge University Press, is the Top New Release on Amazon! To ensure accessibility and affordability, we, the authors, have waived our royalties. Plus, it's always available for free at D2L.ai
    Image
    51K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Oct 7, 2024
    🚀 Exciting internship opportunity! Join the Llama team @AIatMeta and help redefine what's possible with large language models—from pre-training to post-training. Be part of our 2025 research internship and help shape the future of LLMs. Feel free to email or DM me 📩 Learn
    32K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Oct 10, 2022
    Cheer AI up with "let's think step by step"? More plz Let’s think not just step by step, but also one by one We can use more cheers & diversity to SAVE huge manual efforts in chain of thought prompt design, matching or even exceeding performance of manual design on GPT-3 [1/7]
    Image
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Dec 4, 2023
    🚀 Exciting internship opportunity! Our GenAI @Meta team is diving into various facets of #llama, from data to pretraining & finetuning. Passionate about large language models? Join us for a 2024 research internship. Feel free to email or DM me 📩
    51K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Oct 1, 2024
    🚀 New paper from our Llama team @AIatMeta! We discuss "cross capabilities" and "Law of the Weakest Link" of large language models (LLMs): 🔹 Cross capabilities: the intersection of multiple distinct capabilities across different types of expertise necessary to address complex,
    Image
    Image
    28K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Jan 6, 2023
    Don't assign the SAME parameter-efficient fine-tuning strategy to DIFFERENT layers New tips: - Group layers, SPINDLE pattern (e.g, 4-8-8-4 layers) - Allocate params to layers uniformly - Tune all groups - Adjust tuning strategies for diff groups @AmazonScience @stanfordnlp[1/4]
    Image
    22K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Jul 20, 2022
    #ICML Long Oral! @AmazonScience Out-of-Distribution (OOD) Detection in Long-Tailed Recognition 📉 Existing OOD detection fails when training data is long-tail distributed 📈 Ours: SOTA on long-tailed ImageNet Paper: arxiv.org/pdf/2207.01160… Code: github.com/amazon-researc… 1/
    Image
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Apr 18, 2024
    Replying to @SwayStar123
    We are making Llama 3 multimodal
    8.2K
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Dec 5, 2022
    Although our D2L.ai book is free online, many readers have been requesting hard copies for tired eyes So excited to announce: ✅ English publication agreement with @CambridgeUP was signed @AmazonScience ✅ Chinese 2nd edition was sent to print Both in @PyTorch
    Image
  • user avatar
    Aston Zhang
    @astonzhangAZ
    Dec 22, 2022
    If your prompt tuning can't converge easily, make it semi-parametric. 🆕Memory prompt: input-adaptive but no need memory prompt tuning ✅Full fine-tuning on 31 tasks -> zero-shot generalization ✅Parameter-efficient fine-tuning on GLUE -> task transferability on SuperGLUE [1/4]
    SPT (Semi-parametric Prompt Tuning)
    13K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement