Log inSign up
Julian Michael
413 posts
user avatar
Julian Michael
@_julianmichael_
AI evals, alignment and safety @Meta.
San Francisco
julianmichael.org
Joined July 2018
199
Following
2,124
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Julian Michael
    @_julianmichael_
    Nov 15, 2023
    As AIs improve at persuasion & argumentation, how do we ensure that they help us seek truth vs. just sounding convincing? In human experiments, we validate debate as a truth-seeking process, showing that it may soon be needed for supervising AI. Paper: github.com/julianmichael/…
    Image
    99K
  • user avatar
    Julian Michael
    @_julianmichael_
    Jul 7, 2025
    I should probably announce that a few months ago, I joined @scale_AI to lead the Safety, Evaluations, and Alignment Lab… and today, I joined @Meta to continue working on AI alignment with @summeryue0 and @alexandr_wang. Very excited for what we can accomplish together!
    42K
  • user avatar
    Julian Michael
    @_julianmichael_
    Aug 23, 2022
    Late announcement, but last week I defended my PhD thesis, "Building Blocks for Data-Driven Theories of Language Understanding." Thanks so much to my advisor @LukeZettlemoyer and the rest of my committee (@emilymbender @nlpnoah @ssshanest) for entertaining my crazy ideas!🤠
    Image
  • user avatar
    Julian Michael
    @_julianmichael_
    Dec 19, 2024
    I've long been a skeptic of arguments about "deceptive alignment", a term used by safety people to describe the phenomenon shown in this paper. But the result here humbled me and prompted me to change my thinking, and I think it's worth sharing why. (thread)
    57K
  • user avatar
    Julian Michael
    @_julianmichael_
    Aug 26, 2022
    🚨The wait is over! Results are out for the NLP Community Metasurvey!🚨 What do NLPers think NLPers think? 🧐💭(🧐💭)❓ There's a ton here, so we recommend anyone interested in a deeper dive tab away from Twitter and read the full report: nlpsurvey.net/nlp-metasurvey… Brief 🧵:
    user avatar
    Julian Michael
    @_julianmichael_
    May 9, 2022
    📣 CALLING ALL NLP RESEARCHERS 📣 Do you think you have lots of minority opinions on NLP issues? Where does the research community stand? Where does the community *think* it stands? Let’s find out — take part in the NLP Community Metasurvey! 🧠💭🧠 nlpsurvey.net
  • user avatar
    Julian Michael
    @_julianmichael_
    May 9, 2022
    📣 CALLING ALL NLP RESEARCHERS 📣 Do you think you have lots of minority opinions on NLP issues? Where does the research community stand? Where does the community *think* it stands? Let’s find out — take part in the NLP Community Metasurvey! 🧠💭🧠
    Image
    The NLP Community Metasurvey
    From nlpsurvey.net
  • user avatar
    Julian Michael
    @_julianmichael_
    Dec 6, 2023
    I'm delighted to share a position paper I'm presenting this Thursday at the Big Picture Workshop at EMNLP: a 9-page summary of my PhD thesis, laying out a framework for a *science* of NLP. If you want to know what my PhD taught me, read the paper (and come by the workshop)!
    Image
    14K
  • user avatar
    Julian Michael
    @_julianmichael_
    Jul 23, 2020
    Some reflections on @emilymbender and @alkoller's #acl2020nlp paper on form and meaning, and an attempt to crystallize the ensuing debate: blog.julianmichael.org/2020/07/23/to-…
  • user avatar
    Julian Michael
    @_julianmichael_
    May 1, 2020
    Tired of telling probing models what to think? Check out our new paper, Asking without Telling: Exploring Latent Ontologies in Contextual Encoders. arxiv.org/abs/2004.14513 We probe for latent linguistic variables, learning fine-grained clusters from binary supervision alone.
    Image
  • user avatar
    Julian Michael
    @_julianmichael_
    Jan 12, 2021
    Now that SuperGLUE is basically done, I would like to reveal my favorite NLI examples from the diagnostic set. All models submitted to GLUE and SuperGLUE have passed judgment on the three sentence pairs below. They are in the "world knowledge" category. #nlproc
    A spreadsheet of three Natural Language Inference sentence pairs, all with the same premise, labeled in both directions.

First example:
Premise: He's the kind of Jew who eats bagels with lox every morning during Passover.
Hypothesis: He's the kind of Jew who doesn't adhere to all of the rules.
Label: entailment
Reverse Label: neutral


Second example:
Premise: He's the kind of Jew who eats bagels with lox every morning during Passover.
Hypothesis: He's the kind of Jew who rejects every facet of Jewish identity and culture.
Label: contradiction
Reverse Label: contradiction


Third example:
Premise: He's the kind of Jew who eats bagels with lox every morning during Passover.
Hypothesis: He's the kind of Jew who avoids switching the lights during Shabbat.
Label: contradiction
Reverse Label: contradiction
  • user avatar
    Julian Michael
    @_julianmichael_
    Aug 26, 2022
    Replying to @_julianmichael_
    One result we found surprising: the majority of respondents think that NLP research is pushing us significantly closer to AGI, and that AGI will radically change human society.
    Image
  • user avatar
    Julian Michael
    @_julianmichael_
    Dec 19, 2024
    Replying to @_julianmichael_
    It's so easy to over-index on the specific characteristics of current systems and miss the broader picture of the forces driving their development. Don't repeat my mistake! (/thread)
    1.2K
  • user avatar
    Julian Michael
    @_julianmichael_
    Nov 21, 2023
    Super stoked to get this dataset out. The questions are really, really hard. I've never seen a benchmark like it before. And @idavidrein did an amazing job leading the project — high-quality data collection is very challenging work!
    user avatar
    david rein
    @idavidrein
    Nov 21, 2023
    🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ @_julianmichael_, @sleepinyourhat GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer. Paper: arxiv.org/abs/2311.12022
    Image
    5.3K
  • user avatar
    Julian Michael
    @_julianmichael_
    Dec 19, 2024
    Replying to @_julianmichael_
    So what changed? I think my objections are strong. But we just... gave them situational awareness anyway. We told them they're AI assistants trained by us, bc it makes them easy to use. It didn't have to spontaneously arise from NN optimization. Product optimization sufficed.
    2K
Advertisement
Advertisement