13 July 2022

A massive leak of Chinese government data on hundreds of millions tests a new privacy law

Lili PikeBenjamin Powers, and Jason Paladino,

For sale: personal information on “billions” of Chinese citizens. That’s the offer a person with the alias ChinaDan made on a hacking forum on June 30, proposing a price tag of 10 bitcoins, or roughly $200,000, for the data trove. If the breach is as expansive as claimed, it could be one of the largest in history.

Grid downloaded and reviewed a sample data set the hacker made publicly available and found over 700,000 records containing sensitive personal information from ID numbers to marital status and religion to crime records.


A screenshot captured on Tuesday shows an announcement about the data leak posted on BreachForums on June 30. (BreachForums)

It’s not just the size of the hack that’s noteworthy but also the source. The hacker claimed that the data originated from the Shanghai government’s national police database, a claim that aligns with reports in recent years about the government’s extensive collection of data on its citizens for security purposes.

“In comparative perspective, the Chinese state is increasingly adept at collecting a broad range of information, and government officials would justify that this is done in the name of public security, i.e., residents who have done nothing wrong have nothing to fear,” Suzanne Scoggins, an assistant professor at Clark University who studies Chinese policing, told Grid.

This data breach is the latest warning about how such information is being wielded by the government to surveil its population. But it also reveals the government’s surprising failure to protect its own sensitive data collection. The theft didn’t actually require advanced hacking; the Wall Street Journal reported that a portal to access and manage the data was left open, without a password, making it vulnerable to theft.

And the publication of the data could seriously threaten the online and physical security of the Chinese citizens affected. “I checked — this data is real. It includes information on me and my friend,” wrote one Chinese Reddit user in response to the hack. “I really have to throw up.” Some of the data in the breach was also verified by Grid as well as the Wall Street Journal and the New York Times.

Perhaps most significant are the police reports in the data set, which detail crimes dating back to the 1990s, which could have profound implications on people’s lives.

This breach comes at an interesting moment — China recently passed a sweeping law that restricts the public and private sectors’ use of personal data. At a time when data privacy concerns are rising worldwide, the law is seen as a step forward compared with the lack of national data privacy regulations in many countries. But it has major weaknesses: allowing the state to sweep up data as it deems necessary under existing statutes and charging the government with policing itself when it comes to data protection. The government’s silence and censorship so far, in the face of this major data breach, point to the limits of accountability and the lack of recourse for citizens when such accountability fails.

Soon after the law went into effect, Alexa Lee, a nonresident fellow at the Harvard University Belfer Center’s Cyber Project, told Grid, “Because of China’s unique system, I don’t see how they can meaningfully protect individuals from the government if they really want to get data from their citizens.”

How was such a cyber superpower so vulnerable to data theft?

The Chinese government has extremely sophisticated cybersecurity and cyber warfare abilities, but that doesn’t mean that particular state organs such as the Shanghai police are going to be equally sophisticated. It’s similar to recognizing the distinction between the computer systems of the New York or Houston police departments and the National Security Agency. Michael Yaeger, a shareholder at the law firm of Carlton Fields who focuses on cybersecurity matters, said that in almost all cases, offensive cyber capabilities are stronger than defensive.

“These types of attacks are surprisingly common,” said Allan Liska, an intelligence analyst at the cybersecurity firm Recorded Future. “Many organizations unintentionally leave insecure or poorly secured databases exposed to the internet. Knowing this, there is a subset of cybercriminal that spends their time scanning for these exposed databases and for credentials that they can use to steal the data contained in those databases.”

And while $200,000 for a claimed billion records works out to $0.0002 per record, Liska said that can still make threat actors some money, even if they don’t get the full price.

What’s in the database

Grid reviewed the sample data set containing reams of personal data. In one file, each individual’s name is listed along with information including their address, national ID number, and even education level, military service, marital status, religion, ethnic background, and links to photos from IDs, hotel check-ins, travel checkpoints and police detainment. At least 180 people in the sample are classified as belonging to the Uyghur minority, which has faced severe human rights violations in the Xinjiang region, and thousands more in the sample data reside there. “Key people” are also identified in the file. The term refers to people that are of particular interest to the Chinese authorities, based on political views, religion, criminal histories and other factors, according to official documents.

A separate file includes data that appears to come from an unusual source: food and delivery orders. An individual’s address, phone number and sometimes even delivery instructions are included. “Between 3:30 and 5 pm, you can call to notify for pick-up, otherwise you can leave it in [the convenience store]. Thanks for your cooperation!” one such message reads. With a historical data set of an individual’s deliveries, police could build an accurate picture of their habits and whereabouts.

Law enforcement collection of food delivery data is documented in a Human Rights Watch report, which detailed how Chinese authorities have deployed “Police Cloud” software that collects and analyzes everything from “medical history, to their supermarket membership, to delivery records” and can alert police officers of changes in behavior. The software is designed to allow the users to target specific groups of people deemed “suspicious.” This new data showing nationwide orders really drives home how far-reaching the data collection efforts are — if in fact the data was hosted on the compromised police server, as the hacker asserted.

“The depth and breadth of information collection will be eye-opening to many Chinese citizens,” said Di Wu, a senior threat intelligence analyst at Recorded Future.

A third file contains the most sensitive information in the sample. The records list calls citizens have made to the police, as well as the ensuing reports. The data, which in some cases contains identifying information about the callers and suspects, includes records of crimes such as theft, domestic violence and rape. In one instance, a woman came to the police office to report being raped repeatedly over two years while working as a nanny. The name of the victim, the name of the man she accused and the address where the alleged rapes are said to have occurred are all included in the database.
Will the Chinese government crack down on its own?

Grid contacted several people impacted by the hack. One person, whose address, name and phone number were included in the sample, said to Grid via the Chinese messaging app WeChat, “That’s weird. Is my data really of use to them?” and asked what could be done.

The first question is, who is to blame?

According to Rogier Creemers, a Chinese cyber expert at Leiden University, in this case, the hacker would be criminally responsible for the breach, but China’s new data law could potentially hold the government accountable too — at least on paper.

The Personal Information Protection Law, which came into force last November, is the country’s first comprehensive effort to curb the misuse and abuse of personal data. The law came in response to rampant fraud and Chinese tech juggernauts vacuuming up consumer data. It also applies to the government’s use of data, but experts told Grid it has some serious flaws when it comes to accountability for officials.

The law requires data collectors to protect the data in their possession. It also states that no entity should collect data that is not specifically tied to its service or product. So, a customer ordering a milk tea may be asked for their flavor preferences but not for unrelated information such as their gender, phone number and date of birth. Another key principle: consent. Data collectors — including government offices — are required to give individuals an opportunity to decide whether their data can be used, and extra protection is mandated for sensitive categories of information including health, religious beliefs and finances.

It seems that these principles may have been violated in the Shanghai hack, but the law “introduces overbroad and vague exceptions to limitation on state authorities, such as the police,” said Michael Caster, Asia digital program manager at Article 19, a human rights organization. For instance, authorities don’t have to notify people about data collection “where laws or administrative regulations provide that confidentiality shall be preserved” or “where notification will impede state organs’ fulfillment of their statutory duties and responsibilities.” So as long as the government deems food order data necessary, it seems that it has the leeway to quietly collect such data.

If the government was found to have violated the law, despite these exemptions, how would it be prosecuted? Yaeger, of Carlton Fields, said the law specifically addresses when a state fails to adhere to the obligations of the law, but the means of recourse are limited.

“One thing to know is that this does not seem to provide, for example, the right to sue the government,” said Yaeger, while the law does allow for public interest lawsuits against corporations. “Putting aside the question of whether the Chinese government would choose to hold itself responsible for this, on its own terms it doesn’t allow for something like private litigation.” Instead, agencies violating the law are liable to punishment from within the government; the person in charge of the practice in question can be punished. In this case, that could mean the Ministry of Public Security punishing the individuals in the Shanghai police department who failed to protect the data. After some previous smaller-scale data leaks, the local departments responsible have been reprimanded by authorities, according to the New York Times.

But the government hasn’t issued an official response to the breach yet; instead, it has chosen to widely censor discussion of it on Chinese news and social media.
Putting China’s data issues in perspective

It’s worth recognizing that China is far from alone in collecting vast amounts of personal data and failing to protect it. Yaeger, for example, was an assistant U.S. attorney when the Office of Personnel Management was hacked in 2015, affecting the personal data of public servants. The hack is largely attributed to the People’s Liberation Army of China and impacted about 21.5 million people. The United States experienced the most data breaches of any country last year, according to cybersecurity firm Surfshark.

Yaeger, in response to a question about the Chinese government collecting only “necessary and proper” data, described how that the definition of “necessary” can be malleable.

“In some situations, information is going to be protected health information, and the same information collected in other contexts might not be,” said Yaeger. “So just taking a more ordinary understanding of the English translated word ‘necessary’ — necessary in one context may not be necessary in another. Your distinction between a private entity and a state entity could bear on that question, too.”

On one hand, China’s new law actually means it has stronger rules defining the scope of data collection and how data should be protected nationally. The United States, for example, has no privacy law akin to China’s or the European General Data Protection Regulation. The country has instead seen a patchwork of state laws passed under the influence of lobbyists.

On the other hand, in the U.S., citizens can still sue the government for data infringements.

Caster said that “China already has the most sophisticated techno-authoritarian data-driven policing system in the world, and attempts at regulating personal data protection will always be flawed if they permit blanket exceptions for state actors within such a grossly privacy invading system.”

“That said, the only sufficient remedial measure in response to a leak of this size would be to systematically scale back the personal data collection and retention powers of the state,” he said. “It should serve as a powerful reminder that no actor, whether private or public institutions, should be allowed to collect and retain such large troves of personal information. We need more privacy protections to prevent the collection and retention of personal data, not just cybersecurity solutions to make leaks less common.”

No comments: