Julian Michael (@_julianmichael

Julian Michael

413 posts

Julian Michael

@_julianmichael_

AI evals, alignment and safety @Meta.

San Francisco

Joined July 2018

Pinned
Julian Michael
@_julianmichael_
Nov 15, 2023
As AIs improve at persuasion & argumentation, how do we ensure that they help us seek truth vs. just sounding convincing? In human experiments, we validate debate as a truth-seeking process, showing that it may soon be needed for supervising AI. Paper: github.com/julianmichael/…
99K
Julian Michael
@_julianmichael_
Jul 7, 2025
I should probably announce that a few months ago, I joined @scale_AI to lead the Safety, Evaluations, and Alignment Lab… and today, I joined @Meta to continue working on AI alignment with @summeryue0 and @alexandr_wang. Very excited for what we can accomplish together!
42K
Julian Michael
@_julianmichael_
Aug 23, 2022
Late announcement, but last week I defended my PhD thesis, "Building Blocks for Data-Driven Theories of Language Understanding." Thanks so much to my advisor @LukeZettlemoyer and the rest of my committee (@emilymbender @nlpnoah @ssshanest) for entertaining my crazy ideas!🤠
Julian Michael
@_julianmichael_
Dec 19, 2024
I've long been a skeptic of arguments about "deceptive alignment", a term used by safety people to describe the phenomenon shown in this paper. But the result here humbled me and prompted me to change my thinking, and I think it's worth sharing why. (thread)
57K
Julian Michael
@_julianmichael_
Aug 26, 2022
🚨The wait is over! Results are out for the NLP Community Metasurvey!🚨 What do NLPers think NLPers think? 🧐💭(🧐💭)❓ There's a ton here, so we recommend anyone interested in a deeper dive tab away from Twitter and read the full report: nlpsurvey.net/nlp-metasurvey… Brief 🧵:
Julian Michael
@_julianmichael_
May 9, 2022
📣 CALLING ALL NLP RESEARCHERS 📣 Do you think you have lots of minority opinions on NLP issues? Where does the research community stand? Where does the community *think* it stands? Let’s find out — take part in the NLP Community Metasurvey! 🧠💭🧠 nlpsurvey.net
Julian Michael
@_julianmichael_
May 9, 2022
📣 CALLING ALL NLP RESEARCHERS 📣 Do you think you have lots of minority opinions on NLP issues? Where does the research community stand? Where does the community *think* it stands? Let’s find out — take part in the NLP Community Metasurvey! 🧠💭🧠
The NLP Community Metasurvey
From nlpsurvey.net
Julian Michael
@_julianmichael_
Dec 6, 2023
I'm delighted to share a position paper I'm presenting this Thursday at the Big Picture Workshop at EMNLP: a 9-page summary of my PhD thesis, laying out a framework for a *science* of NLP. If you want to know what my PhD taught me, read the paper (and come by the workshop)!
14K
Julian Michael
@_julianmichael_
Jul 23, 2020
Some reflections on @emilymbender and @alkoller's #acl2020nlp paper on form and meaning, and an attempt to crystallize the ensuing debate: blog.julianmichael.org/2020/07/23/to-…
Julian Michael
@_julianmichael_
May 1, 2020
Tired of telling probing models what to think? Check out our new paper, Asking without Telling: Exploring Latent Ontologies in Contextual Encoders. arxiv.org/abs/2004.14513 We probe for latent linguistic variables, learning fine-grained clusters from binary supervision alone.
Julian Michael
@_julianmichael_
Jan 12, 2021
Now that SuperGLUE is basically done, I would like to reveal my favorite NLI examples from the diagnostic set. All models submitted to GLUE and SuperGLUE have passed judgment on the three sentence pairs below. They are in the "world knowledge" category. #nlproc
Julian Michael
@_julianmichael_
Aug 26, 2022
Replying to @_julianmichael_
One result we found surprising: the majority of respondents think that NLP research is pushing us significantly closer to AGI, and that AGI will radically change human society.
Julian Michael
@_julianmichael_
Dec 19, 2024
Replying to @_julianmichael_
It's so easy to over-index on the specific characteristics of current systems and miss the broader picture of the forces driving their development. Don't repeat my mistake! (/thread)
1.2K
Julian Michael
@_julianmichael_
Nov 21, 2023
Super stoked to get this dataset out. The questions are really, really hard. I've never seen a benchmark like it before. And @idavidrein did an amazing job leading the project — high-quality data collection is very challenging work!
david rein
@idavidrein
Nov 21, 2023
🧵Announcing GPQA, a graduate-level “Google-proof” Q&A benchmark designed for scalable oversight! w/ @_julianmichael_, @sleepinyourhat GPQA is a dataset of *really hard* questions that PhDs with full access to Google can’t answer. Paper: arxiv.org/abs/2311.12022
5.3K
Julian Michael
@_julianmichael_
Dec 19, 2024
Replying to @_julianmichael_
So what changed? I think my objections are strong. But we just... gave them situational awareness anyway. We told them they're AI assistants trained by us, bc it makes them easy to use. It didn't have to spontaneously arise from NN optimization. Product optimization sufficed.
2K