Geoffrey Irving (@geoffreyirving) / X

Geoffrey Irving

4,811 posts

Geoffrey Irving

@geoffreyirving

Cofounder and Chief Scientist at Sequent Research. Alignment will be solved, but not necessarily in time. Previously AISI, DeepMind, OpenAI, Google Brain, etc.

London → Berkeley

naml.us/blog

Joined September 2009

Pinned
Geoffrey Irving
@geoffreyirving
Jun 10
We are starting a new, nonprofit alignment organization, ⊢ Sequent Research, bringing together researchers previously on UK AISI’s Alignment Team, Timaeus, and elsewhere to research how to align superintelligence. We are hiring! 🧵
199K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
I have no details of OpenAI's Board’s reasons for firing Sam, and I am conflicted (lead of Scalable Alignment at Google DeepMind). But there is a large, very loud pile on vs. people I respect, in particular Helen Toner and Ilya Sutskever, so I feel compelled to say a few things.
1.2M
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
Third, my prior is strongly against Sam after working for him for two years at OpenAI: 1. He was always nice to me. 2. He lied to me on various occasions 3. He was deceptive, manipulative, and worse to others, including my close friends (again, only nice to me, for reasons)
505K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
...but that seems inconsistent with the thoughtfulness of the people involved. At this point, mostly what feels right is withholding complete judgment until more information lands. I appreciate that will seem unreasonable to a lot of people, but it’s the best suggestion I have.
159K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
The “always nice to me” part puts me in a weird situation: most of the negative stuff I know about happened to others, and even when he lied to me it was about other people to try to drive wedges. So it feels rough for me to unilaterally share it.
146K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
Second, my prior is strongly in favor of Helen and Ilya (I know the other two less well). Ilya is a terrific ML researcher, I had a first-hand seat for two years of his views around safety improving, and my sense from others is that they kept improving after I left in 2019.
144K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
Helen is a terrific strategic and generalist researcher, is extremely thoughtful, and my interactions since I met her in 2018 have been uniformly great. She took the board seat with reservations...
140K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
As I mentioned, I don’t have any details of the Board’s reasons for their action, besides what is publicly known. I am very confused about what happened on Friday and over the weekend: from public information it seems like big tactical and strategic mistakes were made...
150K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
...I advised against it at the time as I thought the governance structure was broken, but she and other people I trust felt it was important to have more independent members on the Board (I’m not yet sure who was right in hindsight).
143K
Geoffrey Irving
@geoffreyirving
Nov 21, 2023
Replying to @geoffreyirving
First, these are my personal views, not those of my employer. But still, a big grain of salt due to conflicts of interest.
142K
Geoffrey Irving
@geoffreyirving
Feb 5, 2024
I am happy to announce that I will be joining the UK AI Safety Institute (AISI) soon as a Research Director. Over 2023 I have been very impressed with the progress made by the UK via the AI Safety Institute and AI Safety Summit, and I am excited to join the team!
Ian Hogarth
@soundboy
Feb 5, 2024
1/ The AI Safety Institute has been in operation for almost eight months and I'm excited to announce some huge new hires. We have begun pre-deployment testing for potentially harmful capabilities on advanced AI systems. This is our third progress report: gov.uk/government/pub…
57K
Geoffrey Irving
@geoffreyirving
Jun 17, 2025
New alignment theory paper! We present a new scalable oversight protocol (prover-estimator debate) and a proof that honesty is incentivised at equilibrium (with large assumptions, see 🧵), even when the AIs involved have similar available compute.
30K
Geoffrey Irving
@geoffreyirving
May 20, 2024
Here’s a thread about why I joined the UK AI Safety Institute (@aisafetyinst) as a Research Director for technical safety and why I think other technical folks should consider roles here (aisi.gov.uk/careers):
Careers | The AI Security Institute (AISI)
From aisi.gov.uk
50K
Geoffrey Irving
@geoffreyirving
Jul 3, 2024
Yoshua Bengio is looking for theory folk to join him to work on Bayesian approaches to AGI safety. I think this is a great opportunity: I've quite enjoyed the theory discussions I've had with Yoshua so far, and would love more work in this direction.
48K