It’s all going by a bit fast for my blurred eyes. I don’t know about you, but watching benchmark after benchmark fall to the latest release from the AI labs puts me on edge. It’s not just the uncertain promise of a radically different future, but the fact it could plausibly arrive so soon. If AI broadly surpasses human-level intelligence, that will be the most significant event in human history — and it won’t necessarily go well for us1.
Granted, speculating about the future is hard, and it’s certainly possible that a lot of what’s happening is hype. It’s also possible that advanced AI will be very good for humankind. You don’t have to look far to see predictions of AI-driven technological abundance and the end of poverty. But I don’t think it’s prudent to make these assumptions. The downside of being wrong is large.
Wrapping your head around what it means for beyond human-level AI to exist can be bewildering. And, even if you feel like you’ve managed that, you still have to contend with what on earth to do about it. How am I relevant or agentic or anything beyond a passive observer? In this substack, I will try to work through this problem and figure out how I can help. It is my small attempt to wrestle these questions out of the ether and into concrete tasks I can complete.
I have been following the literature on AI safety since around 2017, when I was finishing my PhD in theoretical physics. The classical alignment problem, e.g. how to stop superintelligent paperclip maximisers from turning everyone and everything into paperclips, struck me as over-simplified2 and for a long time I had trouble figuring out how to engage with the field. If you turn the ‘intelligence’ parameter up to infinity and then speculate about what happens then yes, you die. But this is a kind of death by definition, and is as impossible to prepare for as a hostile alien invasion.
The remarkable success of large language models however, which are a messier, more organic kind of intelligence than I think most people were expecting, has made it much clearer what advanced AI will actually look like. To me, they show that models acting capably in a complex world must themselves be complex, and will likely resist the kind of neat theorising the AI safety field originally hoped for.
I have been building what I see as a more pragmatic view of the problem. Of course, the word pragmatic will mean different things to different people, but I think my particular approach to AI safety is not heavily represented elsewhere3. I am going to start getting it on paper. I welcome questions and criticism, as these will help me improve my ideas. If you have a comment but don’t want to put a name to it, I have an anonymous feedback form.
My plan is to work through a Generating Process for research that balances theory and experiment, as well as long-term planning and short-term tinkering. This way, I can figure out what I am capable of doing to contribute. There is going to be a certain amount of crossing the moat of low status4, where I make mistakes while acquiring a new skillset. But that’s OK.
I am also planning to post things that are related but not part of the research agenda series. The posts in the agenda will by systematically numbered, the asides will not.
Well, with that out of the way, let’s get started!
This substack will not be trying to persuade you of this case. Please read List of arguments that AI poses an existential risk by Katja Grace if you want an overview of it (along with counter-arguments). For me, it isn’t that every point is watertight, rather that taken collectively I believe they are worrying enough that it’s worth trying to do something about it.
I know it’s supposed to be over-simplified to make a point, but nevertheless a lot of discussion of AI risk has seemed to revolve around various in-the-limit failure modes of a suddenly omnipotent superintelligence.
To give a super-short summary of my perspective: Intelligence is not some abstract thing you can turn up to infinity, it is built through taking actions in the world. You cannot neatly decouple AI from its environment, as the latter defines the affordances it can use to solve problems, ultimately determining the structure of the AI itself. It will grow to mirror the world in which it lives, in a way that enables it to achieve goals in that world. This close coupling constrains the properties highly-advanced AI will have.
Sasha Chapin, The Moat of Low Status