Aston Zhang (@astonzhangAZ) / X

Aston Zhang

208 posts

Aston Zhang

@astonzhangAZ

Pre-training @OpenAI

San Francisco

Joined December 2018

Aston Zhang
@astonzhangAZ
Jul 23, 2024
Our Llama 3.1 405B is now openly available! After a year of dedicated effort, from project planning to launch reviews, we are thrilled to open-source the Llama 3 herd of models and share our findings through the paper: 🔹Llama 3.1 405B, continuously trained with a 128K context
423K
Aston Zhang
@astonzhangAZ
Apr 18, 2024
Llama 3 has been my focus since joining the Llama team last summer. Together, we've been tackling challenges across pre-training and human data, pre-training scaling, long context, post-training, and evaluations. It's been a rigorous yet thrilling journey: 🔹Our largest models
394K
Aston Zhang
@astonzhangAZ
Dec 15, 2022
Our deep learning book goes 1.0 beta🎉 📕Forthcoming on Cambridge Univ Press (w @PyTorch) 🆕JAX implementation 🆕Reinforcement Learning 🆕Gaussian Processes 🆕Hyperparameter Optimization - Thank 500 contributors & 400 univs for choosing! - Free at D2L.ai (1/5)
Aston Zhang
@astonzhangAZ
Feb 3, 2023
"Imagine learning a textbook with no figures." Multimodal chain-of-thought (Multimodal-CoT) in Language Models - Outperform GPT-3.5 by 16% (75%->91%) and surpass human performance on ScienceQA - Less than 1B params (so you can train more easily) - Code & model released [1/6]
98K
Aston Zhang
@astonzhangAZ
Nov 30, 2023
Thrilled that our 'Dive into Deep Learning' book, now published by Cambridge University Press, is the Top New Release on Amazon! To ensure accessibility and affordability, we, the authors, have waived our royalties. Plus, it's always available for free at D2L.ai
51K
Aston Zhang
@astonzhangAZ
Oct 7, 2024
🚀 Exciting internship opportunity! Join the Llama team @AIatMeta and help redefine what's possible with large language models—from pre-training to post-training. Be part of our 2025 research internship and help shape the future of LLMs. Feel free to email or DM me 📩 Learn
32K
Aston Zhang
@astonzhangAZ
Oct 10, 2022
Cheer AI up with "let's think step by step"? More plz Let’s think not just step by step, but also one by one We can use more cheers & diversity to SAVE huge manual efforts in chain of thought prompt design, matching or even exceeding performance of manual design on GPT-3 [1/7]
Aston Zhang
@astonzhangAZ
Dec 4, 2023
🚀 Exciting internship opportunity! Our GenAI @Meta team is diving into various facets of #llama, from data to pretraining & finetuning. Passionate about large language models? Join us for a 2024 research internship. Feel free to email or DM me 📩
51K
Aston Zhang
@astonzhangAZ
Oct 1, 2024
🚀 New paper from our Llama team @AIatMeta! We discuss "cross capabilities" and "Law of the Weakest Link" of large language models (LLMs): 🔹 Cross capabilities: the intersection of multiple distinct capabilities across different types of expertise necessary to address complex,
28K
Aston Zhang
@astonzhangAZ
Jan 6, 2023
Don't assign the SAME parameter-efficient fine-tuning strategy to DIFFERENT layers New tips: - Group layers, SPINDLE pattern (e.g, 4-8-8-4 layers) - Allocate params to layers uniformly - Tune all groups - Adjust tuning strategies for diff groups @AmazonScience @stanfordnlp[1/4]
22K
Aston Zhang
@astonzhangAZ
Jul 20, 2022
#ICML Long Oral! @AmazonScience Out-of-Distribution (OOD) Detection in Long-Tailed Recognition 📉 Existing OOD detection fails when training data is long-tail distributed 📈 Ours: SOTA on long-tailed ImageNet Paper: arxiv.org/pdf/2207.01160… Code: github.com/amazon-researc… 1/
Aston Zhang
@astonzhangAZ
Apr 18, 2024
Replying to @SwayStar123
We are making Llama 3 multimodal
8.2K
Aston Zhang
@astonzhangAZ
Dec 5, 2022
Although our D2L.ai book is free online, many readers have been requesting hard copies for tired eyes So excited to announce: ✅ English publication agreement with @CambridgeUP was signed @AmazonScience ✅ Chinese 2nd edition was sent to print Both in @PyTorch
Aston Zhang
@astonzhangAZ
Dec 22, 2022
If your prompt tuning can't converge easily, make it semi-parametric. 🆕Memory prompt: input-adaptive but no need memory prompt tuning ✅Full fine-tuning on 31 tasks -> zero-shot generalization ✅Parameter-efficient fine-tuning on GLUE -> task transferability on SuperGLUE [1/4]
13K