10 November 2022

Quantifying Cyber Conflict: Introducing the European Repository on Cyber Incidents

Matthias Schulze

Is cyberwar getting better or worse? Are cyber operations increasing or decreasing? This basic fact of cyber conflict (the total number of operations) is often hard to grasp. Depending on whom a curious individual chooses to listen to, estimates to quantify cyber conflict range from thousands to billions of incidents each year. Various stakeholders offer different views on the cyber threat landscape, some of which are classified and some of which are open to the general public. Because most quantifiable data on cyber conflict is shrouded in secrecy within intelligence agencies and military cyber commands, civilians lack a shared, informed cyber situational awareness.

The public lacks quantifiable data about cyber conflicts. But what about more traditional conflicts? Comparatively, there are numerous databases available that help to quantify boots-on-the-ground warfare. Access to this information allows observers to draw more generalizable, concrete conclusions to answer questions about the phenomenon of war. Is it increasing or decreasing? Are wars becoming more deadly? Are there regional hot spots? And if so, where? Are there trends in the motivations to start wars, such as increasing conflicts over resources?

This crucial data is currently unavailable to the public for cyber conflicts. Thus, knowledge about this new type of conflict is often speculative and based on assumptions. For example, this year alone multiple organizations (such as Cloudflare, Microsoft, and Google) claimed to have witnessed the “biggest” cyberattacks in history. These claims make nice media headlines, but they lack a common yardstick: The biggest attack compared to what? Targeted systems? Economic damage? Amount of packets transferred? Duration of an outage? These common classification criteria are missing, making it nearly impossible for observers to make accurate comparisons between different cyber conflicts.

To compensate for this lack of empirical data, most academic research in the field consists of studies that examine only a few instances—often less than five—of cyber conflict, with little variety in case selection. A Google Scholar search reveals that the Stuxnet attack in 2010 (which has 15,000 hits on Google Scholar), Russia’s cyberassault on Estonia in 2007 (which has 20,000 hits), and the Democratic National Committee hack in 2016 (which has 54,000 hits) are among the most discussed cases in the cybersecurity community. While the limited research on this topic poses challenges, some developments have been made in recent years to address the problems described above. For example, U.S. researchers Brandon Valeriano and Ryan Maness created the Dyadic Cyber Incident and Campaign Data (DCID), which is an early and impressive dataset on cyber conflict dyads—the cyber interactions between conflicting parties. Additionally, the Council on Foreign Relations created the easy-to-use Cyber Operations Tracker, the Center for Strategic and International Studies generated an expansive cyber incident list, and the Cyber Peace Institute drafted a unique timeline of cyber operations within the context of the ongoing war in Ukraine. The private sector has also formed initiatives to help quantify cyber conflict, such as cybersecurity provider Kapersky’s cyberthreat real-time map, that are visually appealing but lack depth.

These resources provide some valuable insights but leave out many details. First, most of these resources offer only a superficial glimpse into cyber conflict. They often lack many relevant data points. For example, most include only a few technical indicators, which complicates the ability to make accurate comparisons between different incidents. Notably, the core issue of most quantitative cyber conflict research—attribution and the uncertainty regarding the responsibility of state actors—is often not addressed.

Second, many of these projects do not follow a standardized, strict scientific methodology, leaving open questions about what data is included and what is not. How are incidents classified and coded? Are there reliability checks among coders? Are incidents reassessed once new information arrives to ensure data quality?

Third, while more scientific projects—such as DCID—offer richer data, the provided dataset is often hard to use. Conversely, other projects that are more visually appealing and easier to use tend to lack scientific rigor, creating a trade-off between data richness/quality and ease of use. In most projects available to the public, users can do very little with the data provided. For example, users rarely can perform statistical analysis (with the notable exception of DCID). There is also currently no way to customize data to specific stakeholder needs—and the potential to visualize the data to make it intelligible is often not realized. Notably, there are no trends, graphs, or other handy ways to draw new conclusions from the data. In short, these programs have a lot of room for improvement, especially given the possibilities of modern data analysis and visualization.

Fourth, since the U.S. is the pioneer in cyber conflict research, most current projects lack a European focus. This has two implications. First, predominantly U.S.-centric research might introduce biases in data and varying visibilities into the cyber threat landscape by omitting incidents in smaller EU countries, such as the Balkans. Second, while the EU-U.S. partnership is strong, the EU should take on more responsibility for its security, rather than relying largely on U.S.-based and focused cyber research.

Turning Cyber Conflict Data Into Action

What can be done to address these challenges? Enter the European Repository on Cyber Incidents. A group of researchers (including myself) from European universities and think tanks are launching a new cyber conflict dataset known as the European Repository of Cyber Incidents (EuRepoC). It features data on more than 1,400 different cyber operations worldwide, reaching back to the 2001 Chinese intrusions into the U.S. Department of Defense, and the dataset is growing constantly. With the help of data mining, machine learning, and natural language processing, our program scrapes data on new cyber incidents daily to add them to the database. Human coders evaluate and classify the incidents. Our methodology is peer reviewed, transparent, and open to feedback from the community.

We have had long debates about what and how many cyber incidents to include in the EuRepoC. It would be nearly impossible to include every single one of the millions of distributed denial of service (DDoS) attacks that are launched each year. Therefore, we chose to include only those cyber operations that resulted in a policy response from targeted nations, such as indictments or sanctions, and those that made it into the policy discourse in general. Selected cyber operations also included cyberattacks that targeted political entities, caused a high degree of damage and impact on targets, or both. To gauge political significance, we established a reiterative coding loop: Old incidents are frequently reassessed and updated to check whether they gained political traction or to see if new information, such as an operation’s attribution, arises.

To fill the gaps left by the less comprehensive datasets described above, cyber incidents in the EuRepoC are classified and coded with over 60 different variables—such as political categories and legal dimensions—that reflect an interdisciplinary approach and the current state of research in the field. Political categories include characteristics of targets (like sector, critical infrastructure, damage, and effects), attackers (states, proxies, non-state actors, and advanced persistent threat code names), attribution information (who attributed the attack to whom, in which form, and how quickly), and policy response to attacks (such as sanctions or other diplomatic measures).

We also track whether cyber operations are taking place within the context of an analogue conflict, crisis, or war. For that purpose, we correlate our data with the Heidelberg Conflict Barometer, a project that tracks global conflict dynamics such as conflict issues, intensities, and casualties.

Additionally, we collect data on the legal dimension of cyber operations: What type of legal response was issued after an attack (such as indictments or sanctions)? What areas of international law were affected and invoked by responding states? And could legal countermeasures be warranted?

We also include technical variables derived from MITRE ATT&CK, a technical framework that is used by the information technology (IT) security community to compare tactics and techniques of threat actors. What were the initial access vectors of the attack? Were zero-day vulnerabilities used? What was the technical impact of an attack (disruption, destruction, or physical effects)? And so on. Using this data, we are able to compute accurate estimates of certain indicators, such as the average intensity of a cyber operation, its economic or political impacts, and indicators that show what international legal response could be justified, such as retorsions or other countermeasures. Using this methodology when examining data, we are able to draw holistic conclusions that are relevant for a variety of stakeholders, including policymakers, lawyers, the IT security community, academics, and—more broadly—civil society.

The EuRepoC dataset is therefore one of the most expansive data collections on cyber conflict available to the public: It covers the entire life cycle of cyber operations from initial access to attribution resulting in political and legal action. Our goal is not just to collect data on cyber conflict and thus fill large research gaps. We want to turn data into action. To achieve this, we had to make our dataset easy to use. Thus, we launched an interactive, multimedia dashboard that visualizes the data we collect with powerful graphics. Our dataset overcomes the difficult trade-off between a dataset’s richness/quality and ease of use. EuRepoC offers scientifically accurate data and rich visualizations that are easy to understand, even for a nontechnical audience. For example, EuRepoC’s cyber operations dashboard allows us to clearly represent the geographic distribution of cyber operations, such as what country faced what types of attacks by whom and how often. Using data visualizations, we also show trends such as whether certain types of cyber operations (like data theft, doxxing, or disruptions) are increasing or decreasing in selectable time frames. EuRepoC also allows users to visualize the distribution of targets per economic/public sector as well as the total number of incidents in a given country and time frame. Furthermore, a powerful table view allows users to gain detailed insights into individual cyber operations and their relevant characteristics. Table view also can help to facilitate the comparison of different incidents as well as provide insights into cyber campaigns, which are multiple individual operations that work toward the same goal.

Though the user-friendly dashboard should satisfy the needs of a general audience, we also offer direct access to the entire dataset to allow scholars from all fields to analyze the data on their own and perform statistical analysis in the hopes of empowering European research on cyber conflict. In addition to full access to the dataset and the powerful visualization on the interactive dashboard, EuRepoC offers tailored information for users. Via the Personalized Information Space, users are able to directly receive updates to the dataset in their email inbox, such as insights into the latest cyber operations and selected weekly statistical trends. For a more broad approach, EuRepoC also offers spotlight articles and reports on the general dynamics of cyber conflict.

Our Vision

Our data is strictly open source, and we never use classified material. Our main goal is to make the EuRepoC dataset transparent to offer academia and civil society a glimpse into a domain that is often shrouded in secrecy, and also to encourage feedback from the expert community in the hopes of increasing the data’s quality. This accessible, open-source approach will hopefully empower scientific interdisciplinarity, inform policymakers, and stimulate civil-society expertise concerning cyber conflict.

Currently, the EU’s response to cyber activity is made on a case-by-case assessment. This practice ultimately leads to inaccurate responses to different operations. For example, relatively harmless operations such as the 2015 Bundestagshack espionage operation in Germany are treated similarly to major disruptive operations like the NotPetya attack, which caused billions of dollars in damages. Responses to larger, more disruptive attacks are also often mishandled. For example, France issued no sanctions against Russia in response to Russian cyber interference in the 2017 French election. As others have noted on Lawfare, ill-suited legal and political responses to cyber incidents make EU cyber policy rather incoherent, which goes against EU principles that responses to cyber operations should be proportionate. The EuRepoC will hopefully work to empower EU cyber diplomacy. As described above, our dataset creates a common yardstick to compare and classify cyber incidents. More accurate estimates about cyber operations should prompt a proportionate response to future incidents, thus enabling policy responses such as sanctions to be tailored to the severity of a cyber operation.

No comments: