The spectrum of views on AI safety
I agree the concept of P(doom) is problematic. First, “doom” can mean a variety of things: human extinction, existential catastrophe or gradual disempowerment. Also, P(doom) - condition on present-day regulations or AI slowdown? Furthermore, the timeframe matters, as P(doom within the next $X$ years) increases with $X$.
But perhaps we’re missing the point of the P(doom) question. If someone asks you for P(doom) at a cocktail party, it usually means they’re just interested in hearing general takes on AI safety, at least in my experience.
The P(doom) question isn’t entirely misguided, though. If your interlocutor specifies exactly what they mean by P(doom), say P(gradual disempowerment from power-seeking AI within the next decade|no regulations), and ask for the rough shape of your PDF, then your answer immediately becomes more informative. By asking for a small set of well-chosen estimates, you could get a fairly accurate idea of someone’s core beliefs. But again, you have to pick the right estimates.
Finding these estimates is like asking for the relevant dimensions in a “political spectrum” of views on AI safety. If you were to visualise opinions within the AI safety space, what would be your axes? While such a plot would necessarily be a simplification, perhaps it could allow us to communicate our basic assumptions more effectively. This would lead to more well-informed discussions in the AI safety community.
I imagine we want something between the P(doom) question and the kinds of questions used in expert surveys, like the 2023 Expert Survey on Progress in AI or the AI Reliability & Security Research Priorities. While the P(doom) question is too simple, the questionnaire questions are too complicated1. We’re looking for questions that are as simple as possible, but no simpler - the kinds of questions you could answer at a cocktail party.
Here are the five questions I wish people would have asked me, rather than asking for my P(doom). For some obvious variations, see the footnotes.
- AI timelines2: In what year will we have transformative AI, i.e. AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution3?
- A more informative P(doom)4: Assuming no further regulations on the development of AI systems, what is the probability of gradual disempowerment from AI systems before 2050?
- Threat model: Do the main risks from transformative AI come from bad actors developing destructive technologies and creating power-concentrating mechanisms or from AI systems seeking to eliminate humanity?
- Views on AI slowdown5: How heavily should the government regulate the development of future AI systems?
- Views on centralisation6: Should all leading AI companies be required to open-source their models, to ensure equal access to our most powerful AI systems?
These questions translate naturally into scales from -1 to 1. I also tried listing the questions in rough order of importance, so I’d use the three first questions for the axes of a 3D-plot.
Going through these questions and plotting your position relative to that of others can be amusing. However, it’s also an instructive exercise. After all, these are important questions. Finally, I’ve also found it pretty handy having default answers to these extremely difficult questions at cocktail parties.
Thanks to Agatha Duzan for feedback on this text.
For example, “Rate the extent to which you agree that resolving the core challenges of this sub-area and implementing the resulting solutions would significantly reduce the risk of severe harm (loss of >100 lives or >$10 billion in economic impact from AI”, where a sub-area might be “Ethics-aware training and fine-tuning: Research on learning from imperfect ethical datasets, applying ethics-aware data curation methods, and incorporating collective ethical principles into model design.” Quite a mouthful. ↩︎
Variations: In what year will AI be capable to automate 99% of fully remote jobs? In what year will we have artificial general intelligence (AGI) - an AI which can match or exceed the cognitive abilities of human beings across any task? ↩︎
For an interesting discussion on this topic, see this moderated discussion between Ajeya Cotra, Daniel Kokotaljo and Ege Erdil. ↩︎
Variations: Just modify the conditions, the definition of “doom” or the timeframe. Alternatively, what is the probability of AI having a net positive effect on the world in within the next 20 years? ↩︎
Variations: What might be the minimum sufficient intervention to prevent gradual disempowerment from AIs? ↩︎
Variations: Should leading AI labs be placed under state ownership? ↩︎