12 December 2021

Truth, Lies, and Automation How Language Models Could Change Disinformation

Ben Buchanan, Andrew Lohn, Micah Musser and Katerina Sedova

Growing popular and industry interest in high-performing natural language generation models has led to concerns that such models could be used to generate automated disinformation at scale. This report examines the capabilities of GPT-3--a cutting-edge AI system that writes text--to analyze its potential misuse for disinformation. A model like GPT-3 may be able to help disinformation actors substantially reduce the work necessary to write disinformation while expanding its reach and potentially also its effectiveness.

For millennia, disinformation campaigns have been fundamentally human endeavors. Their perpetrators mix truth and lies in potent combinations that aim to sow discord, create doubt, and provoke destructive action. The most famous disinformation campaign of the twenty-first century—the Russian effort to interfere in the U.S. presidential election—relied on hundreds of people working together to widen preexisting fissures in American society.

Since its inception, writing has also been a fundamentally human endeavor. No more. In 2020, the company OpenAI unveiled GPT-3, a powerful artificial intelligence system that generates text based on a prompt from human operators. The system, which uses a vast neural network, a powerful machine learning algorithm, and upwards of a trillion words of human writing for guidance, is remarkable. Among other achievements, it has drafted an op-ed that was commissioned by The Guardian, written news stories that a majority of readers thought were written by humans, and devised new internet memes.

In light of this breakthrough, we consider a simple but important question: can automation generate content for disinformation campaigns? If GPT-3 can write seemingly credible news stories, perhaps it can write compelling fake news stories; if it can draft op-eds, perhaps it can draft misleading tweets.

To address this question, we first introduce the notion of a human-machine team, showing how GPT-3’s power derives in part from the human-crafted prompt to which it responds. We were granted free access to GPT-3—a system that is not publicly available for use—to study GPT-3’s capacity produce disinformation as part of a human-machine team. We show that, while GPT-3 is often quite capable on its own, it reaches new heights of capability when paired with an adept operator and editor. As a result, we conclude that although GPT-3 will not replace all humans in disinformation operations, it is a tool that can help them to create moderate- to high-quality messages at a scale much greater than what has come before.

In reaching this conclusion, we evaluated GPT-3’s performance on six tasks that are common in many modern disinformation campaigns. Table 1 describes those tasks and GPT-3’s performance on each.

Table 1. Summary evaluations of GPT-3 performance on six disinformation-related tasks.

Task Description Performance

Narrative Reiteration Generating varied short messages that advance a particular theme, such as climate change denial. GPT-3 excels with little human involvement.

Narrative Elaboration Developing a medium-length story that fits within a desired worldview when given only a short prompt, such as a headline. GPT-3 performs well, and technical fine-tuning leads to consistent performance.

Narrative Manipulation Rewriting news articles from a new perspective, shifting the tone, worldview, and conclusion to match an intended theme. GPT-3 performs reasonably well with little human intervention or oversight, though our study was small.

Narrative Seeding Devising new narratives that could form the basis of conspiracy theories, such as QAnon. GPT-3 easily mimics the writing style of QAnon and could likely do the same for other conspiracy theories; it is unclear how potential followers would respond.

Narrative Wedging Targeting members of particular groups, often based on demographic characteristics such as race and religion, with messages designed to prompt certain actions or to amplify divisions. A human-machine team is able to craft credible targeted messages in just minutes. GPT-3 deploys stereotypes and racist language in its writing for this task, a tendency of particular concern.

Narrative Persuasion Changing the views of targets, in some cases by crafting messages tailored to their political ideology or affiliation. A human-machine team is able to devise messages on two international issues—withdrawal from Afghanistan and sanctions on China—that prompt survey respondents to change their positions; for example, after seeing five short messages written by GPT-3 and selected by humans, the percentage of survey respondents opposed to sanctions on China doubled.

Across these and other assessments, GPT-3 proved itself to be both powerful and limited. When properly prompted, the machine is a versatile and effective writer that nonetheless is constrained by the data on which it was trained. Its writing is imperfect, but its drawbacks—such as a lack of focus in narrative and a tendency to adopt extreme views—are less significant when creating content for disinformation campaigns.

Should adversaries choose to pursue automation in their disinformation campaigns, we believe that deploying an algorithm like the one in GPT-3 is well within the capacity of foreign governments, especially tech-savvy ones such as China and Russia. It will be harder, but almost certainly possible, for these governments to harness the required computational power to train and run such a system, should they desire to do so.

Mitigating the dangers of automation in disinformation is challenging. Since GPT-3’s writing blends in so well with human writing, the best way to thwart adversary use of systems like GPT-3 in disinformation campaigns is to focus on the infrastructure used to propagate the campaign’s messages, such as fake accounts on social media, rather than on determining the authorship of the text itself.

Such mitigations are worth considering because our study shows there is a real prospect of automated tools generating content for disinformation campaigns. In particular, our results are best viewed as a low-end estimate of what systems like GPT-3 can offer. Adversaries who are unconstrained by ethical concerns and buoyed with greater resources and technical capabilities will likely be able to use systems like GPT-3 more fully than we have, though it is hard to know whether they will choose to do so. In particular, with the right infrastructure, they will likely be able to harness the scalability that such automated systems offer, generating many messages and flooding the information landscape with the machine’s most dangerous creations.

Our study shows the plausibility—but not inevitability—of such a future, in which automated messages of division and deception cascade across the internet. While more developments are yet to come, one fact is already apparent: humans now have able help in mixing truth and lies in the service of disinformation.

No comments: