The Catastrophic Risks of AI

In The Catastrophic Risks of AI - and a Safer Path, Yoshua Bengio argues for the rise of agentic AI, and the conditions under which their rise might cause humans to go extinct in the near future. In this article, I summarise 16 lessons that I have gained from his talk.

Agentic AI

1. We might be able to build AI that is agentic in the near future.

Bengio argues that we are “going there” (8:55), to be able to build AI that is agentic, in “just a few years […] or a decade” (8:52), because “there’s huge commercial pressure to build AIs with greater and greater agency to replace human labour” (8:57), “[h]undreds of billions of dollars are being invested every year on developing this technology” (5:31), and “this is growing” (5:38).

2. We are willing to build AI that is agentic in the near future.

Bengio argues that tech companies “have a stated goal of building machines that will be smarter than us, that can replace human labor” (5:40).

3. We might build AI that is agentic in the near future.

We might be both able and willing to build AI that is agentic in the near future (see [1] & [2]). If so, and if we will build AI that is agentic in the near future if we are both able and willing to do so in the near future, then we might build AI that is agentic in the near future.
Bengio argues that it is “very plausible” (9:31) that humans will “build machines that […] have their own agency” (9:26). Bengio also argues that AI is “getting better exponentially fast” (6:54), and that it “would take about five years to reach human level” (14:01).

Agentic AI with Human Extinction

4. If AI is agentic, and there are no guardrails against our eradication by AI, then AI will be able to eradicate us.

Bengio argues that if AI has “their own agency” (9:36) and we have neither “the scientific answers nor the societal guardrails” (9:07), then “we’re not ready” (9:05) but are “playing with fire” (9:10).

5. If AI goals are not aligned with our goals, then AI might be willing to eradicate us.

Bengio argues that “if they really want to make sure we would never shut them down, they would have an incentive to get rid of us” (8:34). It seems reasonable that if AI goals are not aligned with our goals, then AI might want to make sure that “we would never shut them down”. If so, then his argument implies that AI might “have an incentive to get rid of us”.

6. If AI is agentic, there are no guardrails against our eradication by AI, and AI goals are not aligned with our goals, then AI might eradicate us.

If AI is agentic, there are no guardrails against our eradication by AI, and AI goals are not aligned with our goals, then AI might be both able and willing to eradicate us (see [4] & [5]). If so, and if AI will eradicate us if it is both able and willling to do so, then AI might eradicate us.

7. AI might be agentic, and AI goals might not be aligned with our goals, in the near future.

Bengio argues that it is “very plausible” (9:31) that we will “build machines that […] have their own agency, their own goals which may not be aligned with ours” (9:26).

8. If there are no guardrails against our eradication by AI in the near future, then AI might eradicate us in the near future.

This follows from [6] & [7].
Bengio argues that “we didn’t, and we still don’t, have ways to make sure this technology eventually doesn’t turn against us” (4:18), and that even “a sandwich has more regulation than AI” (9:20), so that “it is very plausible that” we will go “Poof!” (9:31) if we continue on “this trajectory” (10:01) “despite the warnings of scientists like [himself]” (9:58).

Agentic AI without Human Extinction

9. If AI is agentic, then AI will not be able to eradicate us only if we have guardrails against our eradication by AI.

This follows from [4].

10. We need AI to not be able to eradicate us.

Bengio argues that “[m]itigating the risk of extinction from AI should be a global priority” (4:58).

11. If AI is agentic, then we need guardrails against our eradication by AI.

This follows from [9] & [10].

12. We might need guardrails against our eradication by AI in the near future.

This follows from [3] & [11].
Bengio argues that “[w]e need a lot more of these scientific projects to explore solutions to the AI safety challenges, and we need to do it quickly” (11:45).

13. Scientist AI can be a guardrail against the bad actions of AI only if we help it.

Bengio defines an Scientist AI as an AI that is “modelled after a selfless, ideal scientist who’s only trying to understand the world, without agency” (10:42), and he argues that Scientist AI “could be used as a guardrail against the bad actions of an untrusted AI agent” (11:17), but “[they] need [our] help for this project” (12:29).

14. Our eradication by AI in the near future is a bad action of AI.

Bengio argues that the “risk of extinction” (4:58) is a “catastrophic risk” (1:54).

15. Scientist AI can be a guardrail against our eradication by AI in the near future only if we help it.

This follows from [13] & [14].

16. If we want Scientist AI to be a guardrail against our eradication by AI in the near future, then we should help it.

This follows from [15].