5 June 2021

Truth, Lies, and Automation : How Language Models Could Change Disinformation

Ben Buchanan, Andrew Lohn, Micah Musser, Katerina Sedova

Growing popular and industry interest in high-performing natural language generation models has led to concerns that such models could be used to generate automated disinformation at scale. This report examines the capabilities of GPT-3--a cutting-edge AI system that writes text--to analyze its potential misuse for disinformation. A model like GPT-3 may be able to help disinformation actors substantially reduce the work necessary to write disinformation while expanding its reach and potentially also its effectiveness.

For millennia, disinformation campaigns have been fundamentally human endeavors. Their perpetrators mix truth and lies in potent combinations that aim to sow discord, create doubt, and provoke destructive action. The most famous disinformation campaign of the twenty-first century—the Russian effort to interfere in the U.S. presidential election—relied on hundreds of people working together to widen preexisting fissures in American society.

Since its inception, writing has also been a fundamentally human endeavor. No more. In 2020, the company OpenAI unveiled GPT-3, a powerful artificial intelligence system that generates text based on a prompt from human operators. The system, which uses a vast neural network, a powerful machine learning algorithm, and upwards of a trillion words of human writing for guidance, is remarkable. Among other achievements, it has drafted an op-ed that was commissioned by The Guardian, written news stories that a majority of readers thought were written by humans, and devised new internet memes.

In light of this breakthrough, we consider a simple but important question: can automation generate content for disinformation campaigns? If GPT-3 can write seemingly credible news stories, perhaps it can write compelling fake news stories; if it can draft op-eds, perhaps it can draft misleading tweets.

To address this question, we first introduce the notion of a human-machine team, showing how GPT-3’s power derives in part from the human-crafted prompt to which it responds. We were granted free access to GPT-3—a system that is not publicly available for use—to study GPT-3’s capacity produce disinformation as part of a human-machine team. We show that, while GPT-3 is often quite capable on its own, it reaches new heights of capability when paired with an adept operator and editor. As a result, we conclude that although GPT-3 will not replace all humans in disinformation operations, it is a tool that can help them to create moderate- to high-quality messages at a scale much greater than what has come before.

In reaching this conclusion, we evaluated GPT-3’s performance on six tasks that are common in many modern disinformation campaigns. Table 1 describes those tasks and GPT-3’s performance on each.

Across these and other assessments, GPT-3 proved itself to be both powerful and limited. When properly prompted, the machine is a versatile and effective writer that nonetheless is constrained by the data on which it was trained. Its writing is imperfect, but its drawbacks—such as a lack of focus in narrative and a tendency to adopt extreme views—are less significant when creating content for disinformation campaigns.

Should adversaries choose to pursue automation in their disinformation campaigns, we believe that deploying an algorithm like the one in GPT-3 is well within the capacity of foreign governments, especially tech-savvy ones such as China and Russia. It will be harder, but almost certainly possible, for these governments to harness the required computational power to train and run such a system, should they desire to do so.

Mitigating the dangers of automation in disinformation is challenging. Since GPT-3’s writing blends in so well with human writing, the best way to thwart adversary use of systems like GPT-3 in disinformation campaigns is to focus on the infrastructure used to propagate the campaign’s messages, such as fake accounts on social media, rather than on determining the authorship of the text itself.

Such mitigations are worth considering because our study shows there is a real prospect of automated tools generating content for disinformation campaigns. In particular, our results are best viewed as a low-end estimate of what systems like GPT-3 can offer. Adversaries who are unconstrained by ethical concerns and buoyed with greater resources and technical capabilities will likely be able to use systems like GPT-3 more fully than we have, though it is hard to know whether they will choose to do so. In particular, with the right infrastructure, they will likely be able to harness the scalability that such automated systems offer, generating many messages and flooding the information landscape with the machine’s most dangerous creations.

Our study shows the plausibility—but not inevitability—of such a future, in which automated messages of division and deception cascade across the internet. While more developments are yet to come, one fact is already apparent: humans now have able help in mixing truth and lies in the service of disinformation.

No comments: