What does cyber conflict actually look like? Do adversary states exhibit patterns of behavior in the cyber domain that make them susceptible to deterrence efforts? And are cyber operations better understood as a constellation of one-off events, or are there rhythms and discernible trends that connect these operations into a defined landscape?

These are questions researchers have long grappled with, and they have implications for both planners, policymakers, and the public. We can speculate and hypothesize—and there is value in doing so in an informed fashion—but answering these questions with the maximum degree of detail and nuance requires a very specific input: data.

That’s why our research team recently published version 2.0 of the Dyadic Cyber Incident Dataset (DCID). We explain in detail why we believe this dataset is important in an upcoming article in The Cyber Defense Review, to appear early in 2023, (available now on SSRN). We believe there is an immediate need for this data in the policy and strategic community, and we invite others to use the data to further their own research.

This comprehensive interstate interactions dataset extends from 2000 to 2020 and can be used by military analysts and practitioners to inform the behavioral patterns of the United States’ four main nation-state adversaries—China, Russia, Iran, and North Korea—at the strategic level. Specific operations and tactics can be developed for each adversary in order to better deter in cyberspace, which is an important domain in the integrated deterrence concept put forth by DoD in 2022.

What is DCID?

The cybersecurity and national defense communities requires an open-source resource of cyber actions as they become an ever-increasing threat to global stability. The DCID is the only peer-reviewed source of cybersecurity conflict incident data. This dataset is focused on state action during ongoing rivalry to enable data collection.

First demonstrated in a Foreign Affairs article in 2012 by two of our authors, the initial version of the data was published in the Journal of Peace Research in 2014. The data was further updated to support the books Cyber War versus Cyber Realities in 2015 and Cyber Strategy in 2018.

Since its initial publication, the data has been used in multiple peer-reviewed publications and has provided a wealth of information on nation-state cyber security activities. After an initial major expansion to version 1.5, the newly released version 2.0 represents another dramatic expansion and a rebuild of the data enabled by collaboration with external parties, including US Army Cyber Command and the Naval Postgraduate School, that seek to examine the domain through advanced social science methods.

Why Data Matters

All too often pundits rely on guesswork to make empirical claims about cyber interactions without demonstrating that their theories can be examined using recent historical examples. We hope to avoid prognostication in the field of cybersecurity and push the community toward verification of empirical claims. This is critical for the military community, which cannot rely on fictions to outline practice.

Yet, the collection of international security data is always an ongoing process. No dataset is ever final or complete; rather, there are only different versions of the data. We constantly strive to update and maintain this data as it represents an important independent resource to the community. That’s why we have published DCID version 2.0.

What is Included in DCID?

While collecting cybersecurity interaction data was said to be nearly impossible by some because cyber interactions are secret. Yet, there have been many efforts to collect such data because operations in covert domains are not precluded from coding and identification. It only becomes increasingly difficult over time as nation-state actors seek to avoid attribution for cybersecurity actions. Here, we focus on actions between active international rivals to manage data collection efforts. Eventually, our goal is to produce automated versions of the data leveraging machine-learning algorithms, but the current state of the art remains human-enabled data collection efforts.

The DCID represents a full dataset with over twenty variables. Other cybersecurity dataset collection efforts that have popped up over the years were consulted and this data represents the most advanced accounting of legitimate nation-state action in the cyber domain. While there might be terabytes of data available on the technical aspects of cyberattacks, DCID focuses on identified cyber operations between nation-states, which can include many different cyberattacks within one operation. For now, we exclude nonstate actor incidents and criminal incidents, focusing instead on interstate interactions as they relate to international security. However, we have added a binary indicator to account for initiation by third-party actors.

The data has been collected by multiple parties to ensure redundancy using various sources. Each data point includes a summary news article by an external source and now, new for version 2.0, a technical report associated with the incident. This allows for both technical and political forms of identification. The team leads then examined each data point for accuracy and standards, finally conducting reliability examinations of the data to ensure consistent standards over time.

While the United States government does not own or produce the data, numerous state agents contributed to the project. We cannot remove this bias and can only seek to moderate it by openness, consistent standards, and collaboration with the community to maintain constant updates. DCID version 2.0 contains more US cyber incidents than existing collection efforts. Our researchers used the same open-source data efforts to gather information on US incidents as all other states in the database. In other words, US incidents in the database were included based on public attribution and do not represent a government attribution.

What Do We Find?

Version 2.0 of DCID now contains 433 incidents. This new version expands the timeline and adds new variables for such factors as nation-state-enabled ransomware, supply chain attacks, critical infrastructure sector attacks, and connections to ongoing information operations. The key variables include the actor, incident type, target, method, severity, and objectives. (The forthcoming Cyber Defense Review article and the codebook we published along with the dataset provide a complete accounting of methods and variables.)

Analyzing the dataset yields several important findings. First, espionage attacks continue to make up most of the dataset at 61 percent of all incidents. Simple disruptions make up 28 percent of the data, while more serious degradation attacks make up only 10.7 percent of the total incidents.

Second, it is no surprise that China, Russia, Iran, North Korea, and the United States represent the most active states in the data, but other actors such as Pakistan, Israel, Iran, and Ukraine have a part to play in the domain as either a target or attacker. Three notable target increases include the United Kingdom (333 percent increase from DCID 1.5), Turkey (175 percent increase), and Vietnam (100 percent increase).

Third, with the expansion in total incidents from 266 in version 1.5 of the data to version 2.0 of the data concessions as a percentage of the data declines overall from 4.5 percent to 2.8 percent. Concessions are very rare in cyber interactions and have become rarer in time as the data expands to cover more incidents. We also noted that in total, 22.4 percent of the cyber operations contain associated information operations which indicates the clear association between cyber and information operations over time.

Finally, we developed a ten-point scale to identify the severity of individual incidents, ranging from the low end of one (probing/packet sniffing without kinetic cyber) to the high end of ten (massive death as a direct result of cyber incident). There remain few incidents at the level of five or six, with none above. However, there are many more level-four incidents, representing an increase of 117 percent since version 1.5, to now include 115 incidents.

What Comes Next?

In short, this dataset update enables cybersecurity researchers to examine their theories with greater accuracy, or to dive down and conduct more fine-grained analysis on a constellation of cases that is well defined. This data enables rich quantitative analysis while at the same time supporting thick qualitative analysis.

With this data one can explore, for instance, which states leverage cyber operations and their success rate, or the targets, sectors, and impact of cyber operations by type and time. In the past, we have demonstrated that there can be a foreign policy impact from cyber operations, predicting the Albanian reaction to Iran’s recent attack. In this notable incident, Albania weighed invoking NATO Article 5 and swiftly banned Iranian diplomats from the country after Iran launched destructive cyber operations because Albania hosted a conference with banned Islamic political group MEK. Other researchers can also explore claims of attribution, escalation, coordination, and cross-domain dynamics. Recently, William Akoto even explored the cyber implications of international trade using the DCID data.

Currently, our team is thinking how to incorporate information operations more fully into the data, eventually transforming the dataset into a Dyadic Information Incidents Dataset (DIID). This expansion cannot come at a more crucial time with the new Joint Publication 3-04, Information in Joint Operations approved in September 2022. Assessing new DoD joint operations in the information environment, where cyber operations will play a crucial role, will become increasingly important, and the DCID will be able to assist.

We are also actively coding cyber operations during the Russo-Ukrainian War using different data source collection methods to maintain active awareness of the war and to provide multiple sources of incident data on this important conflict, hopefully avoiding bias in data collection.

We encourage readers to get in contact with any of the authors if they wish to contribute, note mistakes and errors, or provide suggestions for future efforts. Ultimately, we hope DCID version 2.0 serves as a valuable resource for the entire cybersecurity community. As we noted at the outset, answering difficult questions about security, escalation, and deterrence in the cyber domain requires data in pursuit of answers to critical ongoing security challenges.

Ryan C. Maness is an assistant professor in the Department of Defense Analysis and the director of the DoD Information Strategy Research Center at the Naval Postgraduate School.

Brandon Valeriano is a distinguished senior fellow at the Marine Corps University and a senior advisor to Cyberspace Solarium 2.0.

Kathryn Hedgecock is an assistant professor of international affairs at the United States Military Academy at West Point.

Jose M. Macias is an incoming master of public policy student at the University of Chicago’s Harris School of Public Policy and a Pearson fellow with the Pearson Institute for the Study and Resolution of Global Conflicts.

Benjamin Jensen is a professor at the School of Advanced Warfighting at the Marine Corps University and a senior fellow for future war, gaming, and strategy in the International Security Program at the Center for Strategic and International Studies.

The views expressed are those of the authors and do not reflect the official position of the United States Military Academy, Department of the Army, or Department of Defense, or that of any organization the authors are affiliated with.

Image credit: Sgt. Tom Lamb, US Army National Guard