Log inSign up
Gabriel Mukobi
512 posts
Image
user avatar
Gabriel Mukobi
@gabemukobi
AI security researcher. Opinions are my own.
San Francisco, CA
gabrielmukobi.com
Joined September 2017
244
Following
846
Followers
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Jul 27, 2024
    🦅I'm elated to join the technical team at the U.S. AI Safety Institute @NIST! AISI is a diverse team of experts in AI/ML, tech policy, and more, and I feel we have a fantastic opportunity to help the United States lead on the science, standards, and coordination of AI safety.
    Image
    16K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Apr 20, 2024
    Proud to start this month as a research fellow at 🟪@RANDCorporation to advance technical AI governance and in the fall as a CS PhD student at 🐻@UCBerkeley advised by @JacobSteinhardt and @dawnsongtweets! 🏛️I'm also in Washington, DC, until late August if anyone wants to meet!
    5.2K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Nov 27, 2023
    Replying to @DisruptiveBytes and @stats_feed
    Was this written by a language model?
    2.6K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Jul 23, 2024
    🧑🏽‍💻I started a personal blog (separate from my AI strategy blog)! The first post is on "ML Safety Research Advice," or advice for careers in empirical ML research that might help AI safety. Also aimed at AI governance researchers who want to learn more about ML safety. 1/3
    Image
    1.7K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Aug 5, 2024
    📝New blog post, "Four Phases of AGI," on my personal AI strategy blog! In this post, I propose a framework for thinking about AGI progression and its implications for #AIGovernance. Check it out!⤵️ 1/15
    Image
    1.6K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Feb 5, 2024
    Replying to @ToughSf
    Uhhh does this detonate the bombs too?
    11K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    May 22, 2024
    🛡️AI risk management needs defense in depth--not just guardrails or controlling access to frontier models, but also societal adaptation. I'm excited for our new paper to contribute to this landscape and for follow up research and policy to help society adapt to advanced AI!
    user avatar
    Markus Anderljung
    @Manderljung
    May 22, 2024
    Increasingly advanced AI systems will diffuse into society. How do we manage the accompanying risks? In our a paper, we explore Societal Adaptation to Advanced AI: reducing harm from diffusion of AI capabilities by intervening to avoid, defend against, and remedy harmful use.
    Image
    1.2K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Aug 2, 2024
    Happy to have contributed to the Safetywashing paper! We can and should scientifically assess how well safety benchmarks track safety goals 📈
    user avatar
    Dan Hendrycks
    @hendrycks
    Aug 1, 2024
    Do AI safety benchmarks actually measure safety progress? We find ~50% do not, showing safety research is fairly dysfunctional. We hope this work replaces vague arguments with scientific analysis to determine if a line of research makes DL systems safer. arxiv.org/abs/2407.21792
    Image
    Image
    Image
    917
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Jan 10, 2024
    ⚔️📈 Super excited to finally release our paper "Escalation Risks from Language Models in Military and Diplomatic Decision-Making" on arXiv!
    user avatar
    Max Lamparth
    @MLamparth
    Jan 10, 2024
    Do LLMs lead to more escalation in high-stake international and military decision-making? Our new paper studies five off-the-shelf models and their behavior as autonomous agents in real-world conflict scenarios! A🧵
    Image
    arXiv logo
    arxiv.org
    Escalation Risks from Language Models in Military and Diplomatic...
    Governments are increasingly considering integrating autonomous AI agents in high-stakes military and foreign-policy decision-making, especially with the emergence of advanced generative AI models...
    1.5K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Dec 11, 2023
    Ready for NeurIPS! 🔥🌱🌊 Lmk if you want to meet up this week!
    Image
    848
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Jul 23, 2024
    📜Delighted to have contributed to "Open Problems in Technical AI Governance!" Pure ML safety and alignment may look pretty doomed these days, but ML, hardware, and security researchers can instead contribute to hundreds of open technical questions that improve AI governance!
    user avatar
    Anka Reuel | @ankareuel.bsky.social
    @AnkaReuel
    Jul 23, 2024
    Our new paper "Open Problems in Technical AI Governance" led by @ben_s_bucknall & me is out! We outline 89 open technical issues in AI governance, plus resources and 100+ research questions that technical experts can tackle to help AI governance efforts🧵 t.ly/Y-mQ1
    895
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    May 15, 2024
    Replying to @MichaelTrazzi
    Lol I wish I was that good at prediction and and not just unfortunate luck 🙃
    1.7K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Jun 10, 2024
    Super excited for our paper to contribute to the emerging research landscape around a scientific, transparent, and predictable understanding of AI model evaluations! 📈
    user avatar
    Rylan Schaeffer
    @RylanSchaeffer
    Jun 10, 2024
    ❤️‍🔥❤️‍🔥Excited to share our new paper ❤️‍🔥❤️‍🔥 **Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?** w/ @haileysch__ @BrandoHablando @gabemukobi @varunrmadan @herbiebradley @ai_phd @BlancheMinerva @sanmikoyejo arxiv.org/abs/2406.04391 1/N
    Image
    1.5K
  • user avatar
    Gabriel Mukobi
    @gabemukobi
    Dec 23, 2018
    I got an offer from Google, and I took it. I'm elated to be interning as a Googler in Engineering Practicum this summer! instagram.com/p/BruKVPjHoLox…

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement