🦅I'm elated to join the technical team at the U.S. AI Safety Institute @NIST! AISI is a diverse team of experts in AI/ML, tech policy, and more, and I feel we have a fantastic opportunity to help the United States lead on the science, standards, and coordination of AI safety.
Gabriel Mukobi
512 posts
AI security researcher. Opinions are my own.
- Proud to start this month as a research fellow at 🟪@RANDCorporation to advance technical AI governance and in the fall as a CS PhD student at 🐻@UCBerkeley advised by @JacobSteinhardt and @dawnsongtweets! 🏛️I'm also in Washington, DC, until late August if anyone wants to meet!
- Replying to @DisruptiveBytes and @stats_feedWas this written by a language model?
- 🧑🏽💻I started a personal blog (separate from my AI strategy blog)! The first post is on "ML Safety Research Advice," or advice for careers in empirical ML research that might help AI safety. Also aimed at AI governance researchers who want to learn more about ML safety. 1/3
- 📝New blog post, "Four Phases of AGI," on my personal AI strategy blog! In this post, I propose a framework for thinking about AGI progression and its implications for #AIGovernance. Check it out!⤵️ 1/15
- 🛡️AI risk management needs defense in depth--not just guardrails or controlling access to frontier models, but also societal adaptation. I'm excited for our new paper to contribute to this landscape and for follow up research and policy to help society adapt to advanced AI!Increasingly advanced AI systems will diffuse into society. How do we manage the accompanying risks? In our a paper, we explore Societal Adaptation to Advanced AI: reducing harm from diffusion of AI capabilities by intervening to avoid, defend against, and remedy harmful use.
- Happy to have contributed to the Safetywashing paper! We can and should scientifically assess how well safety benchmarks track safety goals 📈Do AI safety benchmarks actually measure safety progress? We find ~50% do not, showing safety research is fairly dysfunctional. We hope this work replaces vague arguments with scientific analysis to determine if a line of research makes DL systems safer. arxiv.org/abs/2407.21792
- ⚔️📈 Super excited to finally release our paper "Escalation Risks from Language Models in Military and Diplomatic Decision-Making" on arXiv!Do LLMs lead to more escalation in high-stake international and military decision-making? Our new paper studies five off-the-shelf models and their behavior as autonomous agents in real-world conflict scenarios! A🧵arxiv.orgEscalation Risks from Language Models in Military and Diplomatic...Governments are increasingly considering integrating autonomous AI agents in high-stakes military and foreign-policy decision-making, especially with the emergence of advanced generative AI models...
- Ready for NeurIPS! 🔥🌱🌊 Lmk if you want to meet up this week!
- 📜Delighted to have contributed to "Open Problems in Technical AI Governance!" Pure ML safety and alignment may look pretty doomed these days, but ML, hardware, and security researchers can instead contribute to hundreds of open technical questions that improve AI governance!Our new paper "Open Problems in Technical AI Governance" led by @ben_s_bucknall & me is out! We outline 89 open technical issues in AI governance, plus resources and 100+ research questions that technical experts can tackle to help AI governance efforts🧵 t.ly/Y-mQ1
- Replying to @MichaelTrazziLol I wish I was that good at prediction and and not just unfortunate luck 🙃
- Super excited for our paper to contribute to the emerging research landscape around a scientific, transparent, and predictable understanding of AI model evaluations! 📈❤️🔥❤️🔥Excited to share our new paper ❤️🔥❤️🔥 **Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?** w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo arxiv.org/abs/2406.04391 1/N
- I got an offer from Google, and I took it. I'm elated to be interning as a Googler in Engineering Practicum this summer! instagram.com/p/BruKVPjHoLox…
















