Last spring, Daniel Kokotajlo, an A.I.-safety researcher working at OpenAI, quit his job in protest. He’d become convinced that the company wasn’t prepared for the future of its own technology, and wanted to sound the alarm. After a mutual friend connected us, we spoke on the phone. I found Kokotajlo affable, informed, and anxious. Advances in “alignment,” he told me—the suite of techniques used to insure that A.I. acts in accordance with human commands and values—were lagging behind gains in intelligence. Researchers, he said, were hurtling toward the creation of powerful systems they couldn’t control.
Kokotajlo, who had transitioned from a graduate program in philosophy to a career in A.I., explained how he’d educated himself so that he could understand the field. While at OpenAI, part of his job had been to track progress in A.I. so that he could construct timelines predicting when various thresholds of intelligence might be crossed. At one point, after the technology advanced unexpectedly, he’d had to shift his timelines up by decades. In 2021, he’d written a scenario about A.I. titled “What 2026 Looks Like.” Much of what he’d predicted had come to pass before the titular year. He’d concluded that a point of no return, when A.I. might become better than people at almost all important tasks, and be trusted with great power and authority, could arrive in 2027 or sooner. He sounded scared.
Around the same time that Kokotajlo left OpenAI, two computer scientists at Princeton, Sayash Kapoor and Arvind Narayanan, were preparing for the publication of their book, “AI Snake Oil: What Artificial Intelligence Can Do, What It Can’t, and How to Tell the Difference.” In it, Kapoor and Narayanan, who study technology’s integration with society, advanced views that were diametrically opposed to Kokotajlo’s. They argued that many timelines of A.I.’s future were wildly optimistic; that claims about its usefulness were often exaggerated or outright fraudulent; and that, because of the world’s inherent complexity, even powerful A.I. would change it only slowly. They cited many cases in which A.I. systems had been called upon to deliver important judgments—about medical diagnoses, or hiring—and had made rookie mistakes that indicated a fundamental disconnect from reality. The newest systems, they maintained, suffered from the same flaw.
Recently, all three researchers have sharpened their views, releasing reports that take their analyses further. The nonprofit AI Futures Project, of which Kokotajlo is the executive director, has published “AI 2027,” a heavily footnoted document, written by Kokotajlo and four other researchers, which works out a chilling scenario in which “superintelligent” A.I. systems either dominate or exterminate the human race by 2030. It’s meant to be taken seriously, as a warning about what might really happen. Meanwhile, Kapoor and Narayanan, in a new paper titled “AI as Normal Technology,” insist that practical obstacles of all kinds—from regulations and professional standards to the simple difficulty of doing physical things in the real world—will slow A.I.’s deployment and limit its transformational potential. While conceding that A.I. may eventually turn out to be a revolutionary technology, on the scale of electricity or the internet, they maintain that it will remain “normal”—that is, controllable through familiar safety measures, such as fail-safes, kill switches, and human supervision—for the foreseeable future. “AI is often analogized to nuclear weapons,” they argue. But “the right analogy is nuclear power,” which has remained mostly manageable and, if anything, may be underutilized for safety reasons.
Which is it: business as usual or the end of the world? “The test of a first-rate intelligence,” F. Scott Fitzgerald famously claimed, “is the ability to hold two opposed ideas in the mind at the same time, and still retain the ability to function.” Reading these reports back-to-back, I found myself losing that ability, and speaking to their authors in succession, in the course of a single afternoon, I became positively deranged. “AI 2027” and “AI as Normal Technology” aim to describe the same reality, and have been written by deeply knowledgeable experts, but arrive at absurdly divergent conclusions. Discussing the future of A.I. with Kapoor, Narayanan, and Kokotajlo, I felt like I was having a conversation about spirituality with Richard Dawkins and the Pope.
In the parable of the blind men and the elephant, a group of well-intentioned people grapple with an unfamiliar object, failing to agree on its nature because each believes that the part he’s encountered defines the whole. That’s part of the problem with A.I.—it’s hard to see the whole of something new. But it’s also true, as Kapoor and Narayanan write, that “today’s AI safety discourse is characterized by deep differences in worldviews.” If I were to sum up those differences, I’d say that, broadly speaking, West Coast, Silicon Valley thinkers are drawn to visions of rapid transformation, while East Coast academics recoil from them; that A.I. researchers believe in quick experimental progress, while other computer scientists yearn for theoretical rigor; and that people in the A.I. industry want to make history, while those outside of it are bored of tech hype. Meanwhile, there are barely articulated differences on political and human questions—about what people want, how technology evolves, how societies change, how minds work, what “thinking” is, and so on—that help push people into one camp or the other.
An additional problem is simply that arguing about A.I. is unusually interesting. That interestingness, in itself, may be proving to be a trap. When “AI 2027” appeared, many industry insiders responded by accepting its basic premises while debating its timelines (why not “AI 2045”?). Of course, if a planet-killing asteroid is headed for Earth, you don’t want NASA officials to argue about whether the impact will happen before or after lunch; you want them to launch a mission to change its path. At the same time, the kinds of assertions seen in “AI as Normal Technology”—for instance, that it might be wise to keep humans in the loop during important tasks, instead of giving computers free rein—have been perceived as so comparatively bland that they’ve long gone unuttered by analysts interested in the probability of doomsday.
When a technology becomes important enough to shape the course of society, the discourse around it needs to change. Debates among specialists need to make room for a consensus upon which the rest of us can act. The lack of such a consensus about A.I. is starting to have real costs. When experts get together to make a unified recommendation, it’s hard to ignore them; when they divide themselves into duelling groups, it becomes easier for decision-makers to dismiss both sides and do nothing. Currently, nothing appears to be the plan. A.I. companies aren’t substantially altering the balance between capability and safety in their products; in the budget-reconciliation bill that just passed the House, a clause prohibits state governments from regulating “artificial intelligence models, artificial intelligence systems, or automated decision systems” for ten years. If “AI 2027” is right, and that bill is signed into law, then by the time we’re allowed to regulate A.I. it might be regulating us. We need to make sense of the safety discourse now, before the game is over.
Artificial intelligence is a technical subject, but describing its future involves a literary truth: the stories we tell have shapes, and those shapes influence their content. There are always trade-offs. If you aim for reliable, levelheaded conservatism, you risk downplaying unlikely possibilities; if you bring imagination to bear, you might dwell on what’s interesting at the expense of what’s likely. Predictions can create an illusion of predictability that’s unwarranted in a fun-house world. In 2019, when I profiled the science-fiction novelist William Gibson, who is known for his prescience, he described a moment of panic: he’d thought he had a handle on the near future, he said, but “then I saw Trump coming down that escalator to announce his candidacy. All of my scenario modules went ‘beep-beep-beep.’ ” We were veering down an unexpected path.
“AI 2027” is imaginative, vivid, and detailed. It “is definitely a prediction,” Kokotajlo told me recently, “but it’s in the form of a scenario, which is a particular kind of prediction.” Although it’s based partly on assessments of trends in A.I., it’s written like a sci-fi story (with charts); it throws itself headlong into the flow of events. Often, the specificity of its imagined details suggests their fungibility. Will there actually come a moment, possibly in June of 2027, when software engineers who’ve invented self-improving A.I. “sit at their computer screens, watching performance crawl up, and up, and up”? Will the Chinese government, in response, build a “mega-datacenter” in a “Centralized Development Zone” in Taiwan? These particular details make the scenario more powerful, but might not matter; the bottom line, Kokotajlo said, is that, “more likely than not, there is going to be an intelligence explosion, and a crazy geopolitical conflict over who gets to control the A.I.s.”
It’s the details of that “intelligence explosion” that we need to follow. The scenario in “AI 2027” centers on a form of A.I. development known as “recursive self-improvement,” or R.S.I., which is currently largely hypothetical. In the report’s story, R.S.I. begins when A.I. programs become capable of doing A.I. research for themselves (today, they only assist human researchers); these A.I. “agents” soon figure out how to make their descendants smarter, and those descendants do the same for their descendants, creating a feedback loop. This process accelerates as the A.I.s start acting like co-workers, trading messages and assigning work to one another, forming a “corporation-within-a-corporation” that repeatedly grows faster and more effective than the A.I. firm in which it’s ensconced. Eventually, the A.I.s begin creating better descendants so quickly that human programmers don’t have time to study them and decide whether they’re controllable.