Summary of GPT-OSS architectural innovations:
1. sliding window attention (ref: arxiv.org/abs/1901.02860)
2. mixture of experts (ref: arxiv.org/abs/2101.03961)
3. RoPE w/ Yarn (ref: arxiv.org/abs/2309.00071)
4. attention sinks (ref: streaming llm arxiv.org/abs/2309.17453)
Graham Neubig
4,724 posts
Associate professor @LTIatCMU.
Co-founder/chief scientist @OpenHandsDev.
I mostly work on modeling language.
- So apparently the cool pictures of the black hole today are from the algorithm in Bouman et al. 2016 (cv-foundation.org/openaccess/con…), a CVPR paper that has been cited a total of 11 times. Citations are not necessarily an indication of impactful work, esp. multidisciplinary work!
- Google’s Gemini recently made waves as a major competitor to OpenAI’s GPT. Exciting! But we wondered: How good is Gemini really? At CMU, we performed an impartial, in-depth, and reproducible study comparing Gemini, GPT, and Mixtral. Paper: arxiv.org/abs/2312.11444 🧵
- 2021 version of CMU "Neural Networks for NLP" slides (phontron.com/class/nn4nlp20…) and videos (youtube.com/playlist?list=…) are being posted in real time! Check it out for a comprehensive graduate-level class on NLP! New this year: assignment on implementing parts of your own NN toolkit.
- I had to travel 26 hours and spend $2000+ to join #ICLR2023 in Rwanda. But people in Africa have to do this every time a conference is held in US. What happens when we make it easier to participate? 1530% higher registrations from Africa. This is important and must continue.
- I've finished uploading the lecture videos for CMU CS11-747 "Neural Networks for NLP"'s 2020 edition: youtube.com/playlist?list=… Check it out if you're interested in a comprehensive graduate-level course on modern NLP methods!
- OpenAI recently added a method to make asynchronous calls, which is good if you want many calls quickly. But it’s not super-well-documented, so I wrote a quick demo of how to make many calls at once, e.g. 100+ in a few seconds. Hope it's helpful! gist.github.com/neubig/80de662…
- I have a joke about neural language models. I have a joke about neural language models. I have a joke about neural language models. I have a joke about neural language models.
- How far are we from having competent AI co-workers that can perform tasks as varied as software development, project management, administration, and data science? In our new paper, we introduce TheAgentCompany, a benchmark for AI agents on consequential real-world tasks.
- Finished uploading all videos for 2019 edition of CMU CS11-747 "Neural Networks for NLP": youtube.com/watch?v=pmcXgN… Like other offerings (e.g. Stanford CS224n) it covers basics, but it's also a grad course with more topics, so it might be a good choice if you want to go deeper!
- We've started the Fall 2022 edition of: 🎓CMU CS11-711 Advanced NLP!🎓 Follow along for * An intro of core topics * Timely content; prompting, retrieval, bias/fairness * Content on NLP research methodology Page: phontron.com/class/anlp2022/ Videos: youtube.com/playlist?list=…
- Announcement: @rbren_dev, @xingyaow_, and I have formed a company! Our name is All Hands AI 🙌 all-hands.dev And our mission is to build the world’s best AI software development agents, for everyone, in the open. Here’s why I think this mission is important 🧵
- I created a Python project starter repo for students that helps maintain good code quality while doing research projects: github.com/neubig/starter… I was opinionated and made only one choice for each tool, but there are other options too!
- We started the Fall 2024 version of CMU CS11-711 Advanced NLP🎓 Follow along to learn about the latest in NLP, LLMs, Agents, etc. * Materials: phontron.com/class/anlp-fal… * Videos: youtube.com/playlist?list=…









