6 August 2022

AI Startups and the Fight Against Mis/Disinformation Online: An Update

Anya Schiffrin, Hiba Beg, Juan Carlos Eyzaguirre

Introduction

In an address at Stanford University in April 2022, former president Barack Obama said that “one of the biggest reasons for democracies weakening is the profound change that’s taking place in how we communicate and consume information.”1 He pointed to the problem of disinformation and suggested that artificial intelligence (AI) would soon exacerbate the threat.

In many ways, Obama’s speech summarized an emerging consensus about the problems in the information ecosystem. Interest in these problems and discussion of solutions have grown among scholars, activists, and legislators since 2016, when investigations revealed the role of mis/disinformation in the US presidential election as well as in the Brexit referendum in the United Kingdom. Over the course of the coronavirus pandemic, worries about the effect of vaccine and public health mis/disinformation have grown. And Russia’s mis/disinformation campaign before and during its invasion of Ukraine once again revealed the high stakes of the problem and demonstrated the need for efforts to combat it, including comprehensive legislation governing social media platforms.2

The spread of mistruth makes it possible to finance phishing schemes, credit-card fraud, fundraising for fake charities, identity theft, and myriad other dark web activities.

The problem goes beyond politics, public health, and national security. The spread of mistruth makes it possible to finance phishing schemes, credit-card fraud, fundraising for fake charities, identity theft, and myriad other dark web activities. What may appear to be a political campaign may actually be a fund-raising scheme. False information and manipulated media are even prevalent on dating sites and TikTok, where appearances can be altered so that the final image differs substantially from the real one.

While many technology companies are committed to building trust in what is on their sites, including affirming the origin of content and ensuring that associated audio, video, and text are authentic, they continue to invest too little in addressing misinformation or deceptive media. As noted by Mounir Ibrahim, the founder of Truepic, a photo and video verification site, fixing online mis/disinformation is “either not part of their business model or antithetical to it.”3

As online mis/disinformation continues to grow and spread, so have attempts to address the problem. In other publications, the authors have discussed the rise of fact-checking and regulatory fixes. This paper looks at the market for tech-based solutions, many of which use some form of artificial intelligence (AI) and machine/deep learning for content moderation, media integrity, and verification. Extending earlier research conducted for the German Marshall Fund in 2019,4 this paper focuses on a selection of niche entrepreneurial firms using AI to identify online mis/disinformation.

According to Justin Hendrix, founder and editor of Tech Policy Press, “the problem of online mis/disinformation is substantial and unsolvable. But there are nevertheless regulatory, reputational, and other commercial reasons to address it. This has created a market for a variety of solutions bought by governments and enterprises.” The analysis in this paper suggests that, while government regulation is critical, the economic and political incentives for mis/disinformation are so powerful—and the complexities of addressing it so substantial—that there is little chance the problem can be meaningfully solved by the market. The firms profiled in this paper—which have emerged to address what appears to be a relatively narrow commercial opportunity—have a role to play in stemming the tide. But, as is true of virtually all the initiatives tried since 2016 to combat mis/disinformation online, market growth has been slow and available financing limited.

Methodology

For this paper, 20 companies were surveyed through interviews to learn about their developing technologies, customers, views of the overall landscape, and expectations of the effects of current and potential regulations in Europe and the United States. The research also dug into the financial incentives for these solutions, the benefits and shortcomings of using these technologies to limit the spread of harmful content online, and the latest innovations in the field. The aim was to see whether the tech giants have turned to these firms for assistance in the fight against online mis/disinformation.

The use of AI and human content moderation can be seen as part of a spectrum of solutions to contain the flow of mis/disinformation as well as to shore up media integrity and verification. In the absence of overarching regulation, several measures have attempted to address the problem. For simplicity, Anya Schiffrin in 2017 divided the measures according to demand and supply.5 Demand-side measures tend to address audiences, or the consumers of content. They include teaching media literacy in schools so that young people can distinguish between truth, opinion, and false or misleading information,6 and building trust in journalism7 so that audiences can be appropriately skeptical and think critically about the source of information in order to separate truth from falsehoods. Rating efforts such as the Journalism Trust Initiative8 and NewsGuard9 strive to show audiences the look and feel of quality information. Supply-side measures aim to choke off the supply of mis/disinformation online by, in part, putting pressure on platforms to refuse its circulation.

The recent blocking of Russia’s RT and Sputnik by major platforms suggests that supply-side measures will remain the most powerful method to slow or halt the spread of misinformation, and their usage will likely increase once the European Union and United Kingdom pass bills aimed at stemming the harm from online mis/disinformation.10 These bills—including the EU’s Digital Services Act—may require social media platforms to step up their use of AI to identify and act on mis/disinformation online.

The Situation in 2019 and in 2022

Technology solutions alone cannot identify all forms of online mis/disinformation—humans are needed. Since 2019, AI applications have become more nuanced and sophisticated. But without human intervention, AI cannot identify all forms of online mis/disinformation. “[Many] disinformation sites look, sound, and feel like an authentic site but publish false claims. AI can help identify content that needs to be reviewed, but I don’t think AI can work without a human in the loop,” observed Matt Skibinski, general manager of the ratings website NewsGuard.12 However, effective content moderation requires vast and costly skilled human labor for forensics, network analyses, and fact-checking and are thus unlikely to scale.

Tech giants have no economic incentive to solve the problem of online mis/disinformation—government regulation is needed to push them to do more. The business models of platforms such as Facebook, Google, YouTube, and Twitter are built on engaging content, irrespective of accuracy or intent.13 Self-regulation and codes of conduct have helped but are not enough. Although regulations are difficult to enforce, the mere awareness of them may incentivize tech giants to increase removals or down-rank mis/disinformation on their sites. However, this may not be true in countries where illiberal leaders, such as India and Brazil, decline to regulate mis/disinformation or hate speech because they themselves use it on social media for political purposes. This applies in the United States too, where Republicans and the far-right benefit from the spread of conspiracy theories online.

Online mis/disinformation is not exclusively a technology problem—it is a by-product of broader political and economic systems, polarization, and lack of trust. It is also a matter for regulators who could, for example, require consumer protection and transparency or address the ease with which sites misrepresenting their backers or intentions can be set up. “Disinformation and misinformation have been approached as a technical issue. That’s the agenda of the big tech players. But more and more, elements are not technical. They are political, economic and regulatory. This is well understood in the industry,” said Alejandro Romero, chief operations officer and co-founder of Constella Intelligence, which monitors online mis/disinformation.14

What Is New?

The companies did not release their revenue figures but it appears there is less of a market for AI solutions that track and halt mis/disinformation campaigns than previously thought. Funding for the startups surveyed does not seem to have grown significantly. Information gathered by Crunchbase, and confirmed in interviews, suggests that only four startups in this area (Truepic, Zignal Labs, Blackbird, and Logically) have received more than $10 million in funding since 2019.15

In search of reliable revenue streams, more than half of the companies surveyed are focusing on the business-to-business (B2B) market, selling mis/disinformation mitigation services to insurance companies, large public entities, and governments, among others. There appears to be a limited market for business-to-consumer (B2C) solutions for detecting mis/disinformation. The different business models and companies in this sector are discussed further below. Guyte McCord, chief operations officer of Graphika, provided an overview, saying: “We are yet to see a B2C scenario. There are consumer-facing applications (fake news detection, news source ratings, etc.), but they are sold through B2B.”16 Graphika uses AI to create detailed maps of social media landscapes to discover how online communities are formed and how information flows within large networks.17

Finally, AI is not the only technology that is effective. Content provenance and blockchain can help authenticate the accuracy or origin of information by watermarking particular pieces of content. News organizations in a number of countries are collaborating with companies on some of these initiatives. Whether these efforts can scale remains to be seen.

How AI Screens Online Mis/disinformation

AI is an easy-to-use technology that trains computers to perform specific analytical tasks based on repeated exposure to data. Success with AI-based tools thus hinges on a compilation of rich and large data sets. The firms surveyed generally tap AI for the following mis/disinformation detection tasks: content analysis with natural language processing (NLP) and pattern recognition with machine/deep learning.

Content Analysis with NLP

NLP is an AI technique that teaches computers to understand speech and the “intent sentiment” of text at a level of comprehension that approximates that of humans. NLP combines computational linguistics—a field that applies computer science to the analysis of language—with models of other AI subsets including machine learning and deep learning.18

Firms using NLP for mis/disinformation detection generally draw on one of two approaches.

One approach—which to date has achieved less success—involves training an algorithm to classify assertions as true or false by showing it large numbers of assertions that have been manually labeled as true or false. For NLP to accurately identify mis/disinformation using this method, consistent definitions of the type of speech need to be identified and sufficient data for training, validation, and testing is required. Unless the models are built with adequate, unbiased, and representative datasets, such as data from different platforms or geographic regions, results can be biased or misleading.

The other approach—more practical at present and more widely used—is to use AI to match text assertions with assertions in a fact-check database. With this method, the AI does not need to figure out what is true or not, but instead essentially performs a keyword search to match claims with fact-checks. This latter approach similarly requires a sizable database of fact-checks but does not require data to train the AI—the AI in this case is said to be “pre-trained.”

Pattern Recognition with Machine/Deep Learning

Machine learning and deep learning—of which NLP is a subfield—involve training an algorithm on text and non-text data signals to imitate human learning, identify actor networks, and understand traffic patterns. A common example of machine learning is recommendation engines embedded in apps used by platforms to collect user data, feed inputs into their algorithms, and note user habits and preferences so that companies can better predict trends and user behavior.

An example of pattern recognition through machine learning was provided by Jennifer Granston, chief customer officer at San Francisco-based startup Zignal Labs:

We don’t label content as “true or false” or “harmful or not harmful.” NLP and different sentiment models allow us to identify, for example, what accounts on Twitter behave as if they are using a high level of automation—or which accounts are likely to be bots, click farms or troll farms—the ones propagating bad content.19

Blackbird.AI, a New York-based startup, uses machine learning and other automation and AI technologies to uncover patterns of malicious behavior and harmful narratives. These patterns might indicate the nature of relationships between users and the content they share or identify the connection and shared beliefs of various online communities through what it calls a “coalition” signal. An example of an AI startup using deep learning is London-based Fabula AI. Founded in 2018, it pioneered the field of “geometric deep learning.” Fabula AI maps geometric routes of how online content spreads on social networks through its deep learning algorithms. As a result, the detection of malicious information or actors does not require reading or understanding of content.

The Business Landscape

Some big players/whales in the business offer enterprise solutions. Then there are a very small number of well capitalized startups. Everyone else are guppies and minnows. The dream still seems to be that regulation may change the game. But is the real story that these massive, centralized platforms are closed and like to build their own solutions, so there simply isn’t a well developed marketplace? And maybe there never will be?

But big tech has proved to be a tougher customer, in sharp contrast to the hopes expressed by the firms surveyed in 2019. At that time, many hoped to expand their customer base to big platforms. As Danielle Deibler, co-founder and CEO of Marvelous AI, put it:

We would love to sell to Google and Facebook, but these large companies are trying to solve this problem themselves, and want to build [the tools] themselves. They don’t want to be subject to public scrutiny for their algorithms. I don’t see them paying lots of money for third parties.20

For example, rather than turning to the startups surveyed here, Meta tends to outsource21 much of its content moderation to third parties such as accenture,22 concentrix,23 and TaskUs.24 These companies frequently hire content moderators to make decisions about content removal and ranking. Meta is notorious for not itself hiring enough content moderators, with those they do hire often based in the Philippines or India where wages are relatively low.25

Corporate secrecy also deters Meta and other platform heavyweights from hiring firms like those profiled here. Factmata’s Antony Cousins said Facebook does not “want to work with third parties because they don’t want people to see how bad the problem is.” A notable exception is Kinzen,26 whose co-founders Mark Little and Áine Kerr have worked with large platforms on fact-checking. Barred by a non-disclosure agreement from going into details, Little noted that companies like Twitter are now scaling up their use of AI, anticipating that online mis/disinformation and other threats will grow before events such as elections.

Low Growth, New Business Model

The 2019 paper on the role of startups mentioned Silicon Valley’s monopsony and how hard it would be for entrepreneurs to scale up their businesses. This time, many of the startups reported that the lack of a growth path prompted them to change their strategy. While all noted the ubiquity of mis/disinformation online, which suggests a healthy market for their services, not all were able to grow their revenues. Some have shifted their customer base from the public to businesses. Others found a scant market for their services and have narrowed their focus. Nearly all sell services to companies that track how their brand is referred to online.

The services provided by the firms fall into the following, often overlapping, categories:Ensuring security and mapping for governments

Combatting online extremism

Monitoring brand safety—often for corporate clients

Nonprofits and open source

Automated fact-checking

Improving the quality of user/reader engagement to grow target audiences

Security

Companies like Constella Intelligence analyze abnormal digital patterns and emerging digital risks such as mis/disinformation and online malign campaigns. They examine data across the full Internet—from surface digital communities and social networks to deep and dark web forums and breached data. These companies map techniques, tactics, and procedures to understand the sort of mis/disinformation being spread and by whom and how. To protect the integrity of authentic sites, companies, governments, intelligence agencies, nongovernmental organizations, and media companies may seek these services to understand potential digital risks to customers, constituents, revenues, product lines, or “real news” that comes from the unknown corners of the Internet.

But the source of mis/disinformation is becoming increasingly muddy. “More and more of the disinformation toolbox is being used by local actors,” said Alejandro Romero, chief operations officer and co-founder of Constella Intelligence. It is increasingly difficult to distinguish between foreign versus local mis/disinformation. False or misleading content is ever more sophisticated and under the radar. Disinformation is now a persistent threat supported by well-organized actors that sell their services—from bots to tailored deepfake videos—in specialized deep and dark web forums, making these accessible to anyone.

Disinformation has also become more ubiquitous. “A Crime-As-A-Service model implies that you don’t even need to fully understand the technology. Bad actors can rent a network of bots or a set of stolen identities to launch their malign campaigns,” Romero added.

Brand Safety

Other firms garner revenue by selling brand safety—services that use AI to help identify and counter mis/disinformation that may harm a firm’s online image or reputation. One such firm is London-based Factmata whose chief executive officer, Antony Cousins, said that firms know that “getting involved in a conversation on social media is a great way to find out what people are saying about your product.”27 Factmata helps companies track discussion of their brand across multiple social media platforms through fully automated AI. Cousins noted: “We do not judge the content ourselves. No humans are involved. We do not put our biases onto the content.”

Jay Pinho, of the brand safety division of Oracle,28 said his office judges “millions of pages a day online and then categorizes on an automated basis what they’re about so brands can make a decision about what they want to be near or far from.” For example, advertisers want to keep well away from controversial content such as that involving obscenity, hate speech, terrorism, or military conflict.

Hoping to increase advertising revenue from companies that care about where their brand is seen online, news organizations are reminding advertisers that they provide accurate and high-quality information. To further their goal of getting more advertising revenues, many news organizations29 have joined coalitions to support quality advertising such as United for News.30

Working with News Outlets—Another Path to Profit

While news organizations are using the brand-safety argument to obtain more advertising revenue, several startups are trying to sell services to newsrooms, including fact-checking, evaluating the origin of content, improving the tenor and quality of online discussions, and monitoring safety threats to journalists. However, news organizations are skeptical customers and many prefer to develop such products in-house so they can keep a tight grip on quality, safety, and ethical standards. Paul Glader, founder and chief executive officer of Vett News, said:

It’s hard to sell anything into the [media] industry right now. It will often only try new things if convinced beyond a shadow of a doubt that it will make the news publisher more money. So we plan to win more business with convincing data.31

Automated Fact-Checking Systems for News Outlets

These services verify written or spoken statements, numbers, and claims. They strive to build audience trust in the hope this will produce more revenue from audiences who value accurate information. For that reason, news organizations are collaborating with tech companies on such products.

Content Provenance

Some news organizations have become involved with tech companies to authenticate information and images. One example is the Adobe-led Content Authenticity Initiative, which has an open-source method of verifying content.32 It is also helping to establish standards for the field through the Coalition for Content Provenance and Authenticity.33
Crowd-sourced Claims Checks

Netherlands-based nwzer takes a different approach with publishers, including newsrooms.34 The company encourages an audience-driven approach to verify the accuracy of content, similar to that explored in James Surowiecki’s The Wisdom of Crowds. It uses an NLP-based algorithm for readers to self-moderate and self-regulate against mis/disinformation in its online comments sections. “You have to make sure that the crowd self-corrects [against mis/disinformation],” explained Karim Maassen, the company’s founder and chief executive officer. Founded in 2017 and funded by Google News Initiative, nwzer says it earns revenue and is profitable.35

Memetica works to identify threats against journalists or other public figures for newsrooms and private security clients.36 “Existing platforms and law enforcement are still catching up with what it means to be an average person at the center of a disinformation campaign,” explained Ben Decker, its founder and chief executive officer.

Non-profit Efforts: Universities and Open Source

Along with private companies, universities have become involved in efforts to rein in mis/disinformation on the Internet. Some efforts that are funded by foundations include a Bill & Melinda Gates Foundation $250,000 grant to Harvard University in 2018 “to understand the scale and nature of the mis- and disinformation problem and [to] determine how to effectively debunk health-related and other falsehoods traveling on social media platforms,”37 and the Center for Security and Emerging Technology at Georgetown University, which does public policy non-partisan research.38

There are also many academics researching AI and misinformation, such as Sarah Oates at the Philip Merrill College of Journalism at the University of Maryland,39 and Katherine McKeown40 at the Data Science Institute at Columbia University.41

Other universities are also incubating AI start-ups to tackle the mis/disinformation problem. Columbia Technology Ventures’42 Vidrovr43 creates technologies to analyze video, which is often used to spread mis/disinformation. Shih-Fu Chang, the interim dean at Columbia University’s Fu Foundation School of Engineering and Applied Science,44 is chief technical advisor at Vidrovr.

For-profit companies also work with universities. Open AI makes its Application Programming Interface available to help others “train” datasets with human input.45 Marvelous AI’s StoryArc analyzes narratives to track and quantify mis/dis information.46 The data provided by StoryArc helped a research team from the University of Maryland’s Philip Merrill College of Journalism track character and identity attacks on Twitter targeting female candidates during the 2020 US presidential primaries.47

Conclusion: Regulation Will Create Innovation

Despite advances, technology alone will not solve the online mis/disinformation problem. Giant social media platforms have few financial incentives to crack down on this—quite the opposite, in fact. To push social media platforms to act against online mis/disinformation and illegal speech, regulations must deftly address the issue while preserving freedom of expression. Marvelous AI’s Danielle Deibler said:

We see ourselves as part of an ecosystem. One group is not enough to fight misinformation. You need policy and regulation and you need the social media companies and journalists to not spread and propagate [untruths or deceptive claims]. You also need people to help keep the government and journalists in check. Hopefully public sector companies and academics can do it.

There has been progress. The EU’s Digital Services Act (DSA), which was agreed in April 2022,48 and the United Kingdom’s proposed Online Safety Bill49 require platforms to conduct risk assessments and share plans with regulators to address potential harms caused by illegal content. In Europe this can mean several forms of speech, including hate speech or incitement.

The DSA focuses on risks to society, while the UK bill focuses on risks to individuals. Germany’s NetzDG law, passed in 2017 and modified in 2021, was an inspiration for the DSA and includes fines for tech giants that have a pattern of knowingly disseminating illegal speech.50 French regulators say the DSA is similar to banking regulation because rather than supervising every transaction, it requires companies to build systems to mitigate risk.

Due to the United States trailing the United Kingdom and the EU in regulating technology platforms and the expansive view of the First Amendment upheld by US courts in recent years, Europe may well set the standard for other countries. What this means for the future of the niche firms profiled here remains to be seen. Nonetheless, regulation is likely to continue to evolve and laws about online harm will spur demand for the types of services described in this paper as well as new opportunities for innovation. With regulation coming sooner in the EU than in the United States, there may be short-term business opportunities for European companies trying to identify potentially harmful mis/disinformation online.

It is also important to note that the types of technologies developed to assess content at scale can be employed quite differently by authoritarian regimes that seek not to create guardrails that preserve free expression, but rather to contain or limit it. Firms in this field must be mindful of the environments in which they operate, which can change suddenly. There are no easy answers when it comes to governing human expression, only trade-offs.

No comments: