Australia ABC just released a 45 min feature on the AI race. @SteveCannane stopped by my office a few weeks ago and we had a great conversation about the controllability of AI agents and the risk of human extinction
I had a great conversation with @labenz last week. In talking about AI self-exfiltration & replication, a key point is compute will be food to future AI agents. The substrate that allows them to make and run more copies, and thus make themselves smarter. Link below
Over the past year, AI agents have learned how to self-replicate. In our test environment, an agent hacks a remote computer and copies itself onto it. Each copy then hacks more computers, forming a chain.
What if the agents were as effective at hacking and spreading in the wild?
We built a simulator: each model uses its measured replication time and success rate, copies replicate too, and targets never run out. Opus spawned 13,000 replicas over 12 hours.
This is a ceiling, not a
Thank you everyone who contributed to this! In 14 days we got >900k in donations and met our matching target! It was actually a pretty close call and some people really scrambled to help make it happen. Seeing people believe in our mission gives me a lot of hope. 🙏
Please consider donating to Palisade! We have 900k of SFF matching that runs out in 14 days. We are quite funding constrained and donations now will both help free up my time and help us expand our comms team.
We’ve just released our first long-form video, by our science communication lead, Dr. Petr Lebedev! It’s about the history and potential future of AI, and includes an exclusive interview with @geoffreyhinton!
An LLM-controlled robot dog saw us press its shutdown button, and the LLM rewrote the robot’s code so it could stay on.
When AI interacts with the physical world, it brings all its capabilities and failure modes with it. 🧵
When we explicitly instructed the model to allow shutdown, the resistance rate dropped to 2 out of 100 in simulated trials.
In robotics, the off switch is often the most critical part of a system.
But if an AI-controlled robot can see you reaching for the switch, and has the