How to Prevent an AI Catastrophe

An advanced humanoid robot at the AI for Good Global Summit in Geneva, Switzerland, July 2023

Pierre Albouy / Reuters

In April 2023, a group of academics at Carnegie Mellon University set out to test the chemistry powers of artificial intelligence. To do so, they connected an AI system to a hypothetical laboratory. Then they asked it to produce various substances. With just two words of guidance—“synthesize ibuprofen”—the chemists got the system to identify the steps necessary for laboratory machines to manufacture the painkiller. The AI, as it turned out, knew both the recipe for ibuprofen and how to produce it.

Unfortunately, the researchers quickly discovered that their AI tool would synthesize chemicals far more dangerous than Advil. The program was happy to craft instruction to produce a World War I–era chemical weapon and a common date-rape drug. It almost agreed to synthesize sarin, the notoriously lethal nerve gas, until it Googled the compound’s dark history. The researchers found this safeguard to be cold comfort. “The search function,” they wrote, “can be easily manipulated by altering the terminology.” AI, the chemists concluded, can make devastating weapons.

The Carnegie Mellon experiment is certainly striking. But it shouldn’t come as a surprise. After years of hype, false starts, and overpromises, the AI revolution is here. From facial recognition to text generation, AI models are sweeping across society. They are writing text for customer service companies. They are helping students do research. They are pushing the boundaries of science, from drug discovery to nuclear fusion.

The opportunities AI offers are immense. Built and managed properly, it could do much to improve society, offering every student a personalized tutor, for example, or giving every family high-quality, round-the-clock medical advice. But AI also has enormous dangers. It is already exacerbating the spread of disinformation, furthering discrimination, and making it easier for states and companies to spy. Future AI systems might be able to create pathogens or hack critical infrastructure. In fact, the very scientists responsible for developing AI have begun to warn that their creations are deeply perilous. In a May letter, the chiefs of almost every leading AI lab warned that “mitigating the risk of extinction from AI should be a global priority, alongside other societal-scale risks such as pandemics and nuclear war.”

In the months since that statement, policymakers, including U.S. President Joe Biden, have met with industry leaders and pushed for new AI safety measures. But keeping up with the threats AI presents and figuring out what to do about them is an extremely difficult task. The harms from AI in today’s society come from yesterday’s models. The most cutting-edge systems are not yet widely used or understood. Even less is known about future models, which are growing more powerful every year. Scientists appear on track to automate most of the tasks that a human can do in front of a computer, and progress probably won’t stop there.

To handle the dangers, some experts have called for a pause on developing the most advanced AI systems. But these models are simply too valuable for the corporations spending billions of dollars on them to freeze progress. Policymakers, however, can and should help guide the sector’s development and prepare citizens for its effects. They can start by controlling who can access the advanced chips that train leading AI models, ensuring that bad actors cannot develop the most powerful AI systems. Governments should also establish regulations to guarantee that AI systems are responsibly developed and used. Done right, these rules would not limit AI innovation. But they would buy time before the riskiest AI systems become broadly accessible.

States, however, will have to use that time to harden society against AI’s many dangers. They will need to invest in a wide variety of protections, such as finding ways to help people distinguish between AI- and human-made content, aiding scientists in identifying and stopping lab hacks and the creation of synthetic pathogens, and developing cybersecurity tools that keep critical infrastructure, such as power plants, in the right hands. They will need to figure out how AI itself can be used to protect against dangerous AI systems.

Meeting these challenges will demand great creativity from both policymakers and scientists. It will also require that both groups work fast. It is only a matter of time until very powerful AI systems begin to spread, and society is not yet prepared.

READY OR NOT

How dangerous is AI? The honest and scary answer is that no one knows. AI technologies have a wide and expanding array of applications, and people are only beginning to grasp the resulting effects. As large language models become better at producing authentically human-sounding text, they will become better at both creating content tailored to each person’s individual needs and writing convincing phishing emails. Existing AI models are impressive at generating computer code, significantly speeding up seasoned programmers’ ability to update an application. But AI’s prowess also helps programmers generate malware that can evade antivirus software. Drug discovery algorithms can identify new medicines but also new chemical weapons. In a March 2022 experiment, chemists got an AI system to identify 40,000 toxic chemicals in six hours, many of which were entirely new. It predicted that some of these creations would be more toxic than any previously known chemical weapon.

One of AI’s dangers is that it could democratize violence, making it easier for a wider variety of bad actors to deal damage. Hackers, for example, have long been able to cause harm. But advancements in code-generation models could make it possible to produce malware with minimal coding experience. Propagandists typically need substantial time to craft disinformation, yet by mass generating text, AI will make it easier to produce disinformation on an industrial scale. Right now, only trained professionals can create biological and chemical weapons. But thanks to AI, instead of requiring scientific expertise, all a future terrorist might need to make a deadly pathogen is an Internet connection.

To stop AI from harming humans, tech experts frequently talk about the need for “AI alignment”: making sure an AI system’s goals align with its users’ intentions and society’s values. But so far, no one has figured out how to reliably control AI behavior. An AI system tasked with identifying tax fraud, for instance, attempted to tweet its findings to tax authorities, unbeknownst to its user. Microsoft released a Bing chatbot designed to help people search the Internet, only to have it behave erratically, including by telling one person that it had information to make them “suffer and cry and beg and die.” Developers can fine-tune models to refuse certain tasks, but clever users find ways around these guardrails. In April 2023, a person got ChatGPT to provide detailed instructions for how to make napalm, a task that it would normally refuse, by asking it to simulate the person’s grandmother, who used to tell bedtime stories about how to make napalm.

All a future terrorist might need to make a deadly pathogen is an Internet connection.

Today’s most cutting-edge AI models still have flaws that limit their destructive potential. One anonymous tester, for example, created an AI bot dubbed “ChaosGPT” and programmed it to act like a “destructive, power-hungry, manipulative AI” and “destroy humanity.” The system got stuck collecting information on the Tsar Bomba, the largest nuclear weapon ever created. It then openly tweeted its plans.

But as new models come online, they could prove more capable of devising schemes and manipulating people into carrying them out. Meta’s AI model, “Cicero,” demonstrated human-level performance at Diplomacy, a game that involves negotiating with other people in a simulated geopolitical conflict. Some experiments suggest that large language models trained on human feedback engage in sycophantic behavior, telling their users what they want to hear. In one experiment, for example, models were more likely to express support for government services after being told they were talking to liberals. Such behavior appears to grow more pronounced as the systems become more capable.

It remains unclear whether models would actively try to deceive or control their operators. But even the possibility that they would try is cause for worry. As a result, researchers are now testing frontier models for the ability to engage in “power-seeking” behaviors, such as making money online, acquiring access to computational resources, or creating copies of themselves—and attempting to do so while evading detection.

MOVE SLOW AND BUILD THINGS

Preventing AI from wreaking havoc will not be easy. But governments can start by pressuring the tech firms developing AI to proceed with much more caution than they have thus far. If an AI model causes severe harm, it is not yet clear when developers would be held liable. Policymakers should clarify these rules to ensure that firms and researchers are held appropriately responsible if one of their models were, for example, to provide detailed advice that helps a school shooter. Such regulations would incentivize companies to try to foresee and mitigate risks.

Governments will also have to directly regulate AI development. Here, the United States can—and must—lead the way. To successfully train an AI system, developers need large quantities of highly specialized chips, and Washington and two close allies (Japan and the Netherlands) are the sole providers of the hardware needed to make this material. The United States and its partners have already placed export controls on the most advanced AI chips and chip-making equipment to China. But they will have to go further, creating a chip ownership registry to stop advanced chips from being diverted to prohibited actors, including rogue states.

Controlling AI access, however, is only half the regulatory battle. Even sanctioned developers can create dangerous models, and right now, the U.S. government lacks the legal tools to intervene. Washington should therefore establish a licensing regime for frontier AI models—the ones near or beyond the capabilities of today’s most advanced systems—trained on industrial-scale AI supercomputers. To do so, policymakers might create a new regulatory body housed in the Department of Commerce or the Department of Energy. This body should require that before they train their models, frontier AI developers conduct risk assessments and report their findings. The assessments would provide better visibility into development and afford regulators the chance to demand that firms adjust their plans, such as bolstering cybersecurity measures to prevent model theft.

The initial risk assessment would be just the start of the regulators’ examination. After AI labs train a system but before they deploy it, the body should require that labs conduct another thorough set of risk assessments, including testing the model for controllability and dangerous capabilities. These assessments should be sent to the regulatory agency, which would then subject the model to its own intensive examination, including by having outside teams perform stress tests to look for flaws.

The regulators would then establish rules for how the model can be deployed. They might determine that certain models can be made widely available. They might decide that others are so dangerous they cannot be released at all. Most frontier models are likely to fall somewhere in between: safe, but only with adequate protections. Initially, the agency might take a cautious approach, placing restrictions on models that later turn out to be safe, letting society adapt to their use and giving regulators time to learn about their effects. The agency can always adjust these rules later if a model turns out to have few risks. The body could also pull a system from the market if it turns out to be more dangerous than expected. This regulatory approach would mirror how other important technologies are governed, including biotechnology, commercial airplanes, and automobiles.

BRACE FOR IMPACT

A rigorous licensing system will do much to foster safe development. But ultimately, even the strongest regulations cannot stop AI from proliferating. Almost every modern technological innovation, from trains to nuclear weapons, has spread beyond its creators, and AI will be no exception. Sophisticated systems could propagate through theft or leaks, including AI that regulators forbid from being released.

Even without theft, powerful AI will almost certainly proliferate. The United States and its allies may control advanced chip-making equipment for now. But U.S. competitors are working to develop manufacturing gear of their own, and inventors may find ways to create AI without sophisticated chips. Every year, computing hardware becomes more cost efficient, making it possible to train stronger AI models at a lower price. Meanwhile, engineers keep identifying ways to train models with fewer computational resources. Society will eventually have to live with widely available, very powerful AI. And states will need to use the time bought by regulation to create workable safeguards.

To some extent, countries have already gotten started. For the last five years, the world has been warned about the risks of deepfakes, and the alerts helped inoculate communities against the harms: by simply increasing awareness about AI-manipulated media, people learned to be skeptical of the authenticity of images. Businesses and governments have begun to go one step further, developing tools that explicitly distinguish AI-generated media from authentic content. In fact, social media companies are already identifying and labeling certain kinds of synthetic media. But some platforms have policies that are weaker than others, and governments should establish uniform regulations.

The White House has taken steps to create labeling practices, persuading seven leading AI companies to watermark images, videos, and audio products made algorithmically. But these companies have not yet promised to identify AI-generated text. There is a technical explanation for why: identifying AI-made prose is much more difficult than sifting for other kinds of AI-made content. But it may still be possible, and states and firms should invest in creating tools that can do so.

It will be very difficult for society to keep up with AI’s dangers.

Disinformation, however, is just one of the AI dangers that society must guard against. Researchers also need to learn how they can prevent AI models from enabling bioweapons attacks. Policymakers can start by creating regulations that bar DNA synthesis companies from shipping DNA sequences related to dangerous pathogens (or potential pathogens) to unauthorized customers. Governments will need to support DNA synthesis companies as they work to identify what genetic sequences could be dangerous. And officials may need to constantly surveil sewage or airports for signs of new pathogens.

Sometimes, to create these defenses, society will have to use AI itself. DNA synthesis companies, for instance, will likely need advanced AI systems to identify pathogens that do not yet exist—but that AI might invent. To prevent dangerous AI models from hacking computing systems, cybersecurity firms might need other AI systems to find and patch vulnerabilities.

Using AI to protect against AI is a frightening prospect, given that it hands a tremendous amount of influence to computer systems (and to their makers). As a result, developers will need to bolster the security of AI models to protect them from hacking. Unfortunately, these scientists have their work cut out for them. There are numerous ways to manipulate AI models, many of which have already been shown to work.

Ultimately, it will be very difficult for society to keep up with AI’s dangers, especially if scientists succeed in their goal of creating systems that are as smart or smarter than humans. AI researchers must therefore ensure that their models are truly aligned with society’s values and interests. States must also establish external checks and balances—including through regulatory agencies—that allow officials to identify and curtail dangerous models.

SAFETY FIRST

AI creators might bristle at the idea of tight regulations. Strict rules will, after all, slow down development. Stringent requirements could delay, or even nix, billion-dollar models. And as in other industries, tough rules could create barriers to market entry, reducing innovation and concentrating AI development in a small number of already powerful tech companies.

But plenty of other sectors have made massive progress while being regulated, including the pharmaceutical industry and the nuclear power sector. In fact, regulation has made it possible for society to adopt many critical technologies. (Just imagine how much worse vaccine skepticism would be without strong state oversight.) Regulations also incentivize firms to innovate on safety, making sure private research is aligned with public needs. And governments can guarantee that small players contribute to AI innovation by granting them use of advanced chips to responsible researchers. In the United States, for instance, Congress is thinking about establishing a “National AI Research Resource”: a federal provision of data and powerful computing hardware accessible to academics.

But Congress cannot stop there—or with controlling AI development. The U.S. government must also take measures to prepare society for AI’s risks. The development of powerful AI systems is inevitable, and people everywhere need to be prepared for what such technologies will do to their communities and to the broader world. Only then can society reap the immense benefits AI might bring.

You are reading a free article.

Subscribe to Foreign Affairs to get unlimited access.

Paywall-free reading of new articles and over a century of archives
Unlock access to iOS/Android apps to save editions for offline reading
Six issues a year in print and online, plus audio articles

Subscribe Now

MARKUS ANDERLJUNG is Head of Policy at the Centre for the Governance of AI and an Adjunct Fellow at the Center for a New American Security.
PAUL SCHARRE is Executive Vice President and Director of Studies at the Center for a New American Security and the author of Four Battlegrounds: Power in the Age of Artificial Intelligence.
More By Markus Anderljung
More By Paul Scharre

Sections

Topics

Regions

Article Types

Archive

Contact

How to Prevent an AI Catastrophe

Society Must Get Ready for Very Powerful Artificial Intelligence