What do censorship and surveillance programmes look for? What can this tell us about internet usage in China?

Can we contrast with the perceived surveillance state of the West? What are the implications for a company in the Chinese market?

Unsurprisingly, there are lots of questions still to be answered about the state of the internet in China.

First Monday has this month published a very interesting paper, presenting an analysis of data from a year and a half tracking the censorship and surveillance keyword lists of two instant messaging (IM) programs used in China.

I thought it would be useful to sum up what Crandall et al. found, so you don’t have to read the whole thing. Although this study looks at IM clients, there are certainly findings that can be extrapolated across public services, such as Baidu and Sina Weibo.

So first let’s look at the results


General government guidelines

First thing to say is there’s little overlap between keyword lists picked out for censorship and surveillance by each client.

Of 4,256 unique keywords, only 138 terms were shared in common between TOM–Skype and Sina UC. This lack of overlap suggests that government authorities don’t provide a standard keyword list.

The paper states that previous studies have similarly found little consistency in the implementation of censorship in Chinese blog services (MacKinnon, 2009) and search engines localised for the Chinese market (Villeneuve, 2008a).

How this flexibility arises is a question for further research. Although there are some themes that occur regularly… 

So what keywords were under surveillance/censored? 

The keyword list related variously to Chinese politics (human rights etc), socially sensitive content (pornography, gambling etc), people (dissidents), sensitive events, technology (spyware, URLs etc) and other miscellaneous topics. 

Some keywords were particularly specific, such as instructions and locations related to Jasmine Rallies (pro–democracy protests which took place across China in early 2011 after the Arab Spring). 

Also included were specific names of dissidents and, interestingly, neologisms (nascent words that often vary by region and are often used when discussing sensitive issues). 

This is often fairly significant when it comes to some form of ‘filter bubble’. How will sensitive issues be discussed or researched with this narrow targeting in place?

Within the ‘people’ theme, the most prominent category references members of the CPC. Keywords in the “social” theme were mostly relating to illicit goods and services, which included narcotics, weapons and counterfeit goods; and ‘prurient interest’, which was mostly pornography and prostitution.

On the flip side, some very general terms were included, for example, TOM–Skype lists included ‘Chinese people’ and ‘internet’.

With surveillance this broad, there’s a question as to how all this data is used. Is this simply another variable to add into the mix, qualifying some other data points? 

Below you can see a couple of figures showing political and events categories, and the sorts of keywords that were subject to surveillance and in some cases censorship. 


Swift updates to keywords under surveillance or censorship

One of the features of both sets of keywords was the speed at which they were variously updated to take account of current events.

Here’s a good example:

In March 2012, the son of a high ranking CPC official was killed in a car crash outside of Beijing. Two women in the car were injured, and all three were reported to be naked. Photographs of the Ferrari began circulating online.

The incident was politically sensitive as super cars were ripe to stir up public criticism over government corruption and inappropriate behaviour.

Within a day of the crash, reports emerged that searches for ‘Ferrari’ and other related terms had been blocked on Sina Weibo, Baidu and Soso (Dao, 2012). Chinese state media did initially publish a story about the crash, but a Global Times article was later removed.

TOM–Skype surveillance keyword lists were updated within a few days of the event.

How does this affect companies?

Publishers of news or internet portals rely on cooperation with state–owned media for content. Sina Corp.’s annual report for 2011.

[T]he PRC government has the ability to restrict or prevent state–owned media from cooperating with us in providing certain content to us, which will result in a significant decrease of the amount of content we can publish on our websites.

We may lose users if the PRC government chooses to restrict or prevent state–owned media from cooperating with us, in which case our revenues will be impacted negatively.

This reliance on state media leaves CPC powerful enough to control the industry (perhaps precisely the opposite of past situations in the UK).

MacKinnon (2009) examined how 15 different Chinese blog service providers filter and delete posts. The conclusion was that censorship of user generated content in China is highly decentralized, and companies are responsible for their own conduct, having a big impact on how censorship and surveillance is carried out.

Is censorship receding?

The most recent updates (late 2011) to the censorship lists for most TOM–Skype versions reduced these lists to a single keyword, effectively eliminating censorship.

Surveillance–only lists are still active.

Similarly, in September 2012, most of the Sina UC lists were reduced to a single keyword. Only some usernames remained censored and under surveillance. It is notable that authorities have moved to enforce real name registration on microblogs and other online services.

It is possible that Sina UC has implemented surveillance on the server side that the study could not detect. The study posits that Chinese IM programs may be moving to surveillance of particular users and sensitive topics, while public platforms like Baidu and Sina Weibo experience greater pressure to filter and delete sensitive information.


Microsoft and Skype

Skype (now Microsoft owned) and TOM Online established a joint venture in 2005. In April 2011, Skype acknowledged perceived sensitivities:

… We understand that Tel–Online Limited is obligated by the government to provide this filtering and storage. We received significant negative media attention as a result of these practices, as well as a security failure relating to the storage of these instant messages. Further news reports concerning content filtering and the apparent lack of privacy of communications in China and other countries are attracting political attention in the United States and Europe. Such attention could develop into legislative action resulting in additional legal requirements being imposed on us.

Skype does not alert users to these potential risks. With Microsoft Corporation’s acquisition of Skype in October 2011 for $8.6B, there are now questions to be answered by Microsoft as to the transparency of data collection by its services.

Microsoft’s close collaboration with the NSA has somewhat changed the West’s perspectives on censorship and surveillance in China. Surveillance is undoubtedly happening on a wide scale in US and Europe, and it can be hypothesized that censorship in China is waning, at least in IM clients.

What do state officials say?

The study gives a quote by Wang Chen then director of the Information Office of the State Council and the International Communication Office of the Communist Party of China (CPC).

It is obfuscating enough to suggest the legal framework can be shifted as required. 

We are following the overall thinking of combining Internet content management with industry management and security supervision; combining prior review and approval with supervision afterwards; combining technological blocking with public opinion guidance; combining hierarchical management with local management; combining government management with industry self–regulation; and combining online monitoring with off–line management. We have set up a pilot management system that integrates legal regulation, administrative supervision, industry self–regulation, and technological safeguards. 


It’s obviously unscientific to extrapolate the findings from this study across public platforms. However, there are many other studies out there. 

At the very least, surveillance of usage happens very broadly, and failure to self-censor can leave big platforms and news outlets stranded without access to government information and support. 

There must surely be many companies who take risks in the Chinese market, mostly due to public fears around lack of transparency (see Microsoft), but recent NSA revelations mean that these fears have spread across all markets.

Censored subjects are fairly open secrets, with the lists found in this study not surprising. The internet cannot fail but shed light on many markets and governments previously, to some degree or other, cloaked. 


TOM–Skype and Sina UC were chosen because these two IM programs implement censorship (and surveillance in the case of TOM–Skype) inside the client software. Collection of the keyword lists began in April 2011 for TOM–Skype and August 2011 for Sina UC. Collection ended end of January 2013, with the latest changes occurring on 20 December 2012 (TOM–Skype)/11 October 2012 (Sina UC). 

The software binaries of the clients were reverse engineered to get keyword list URLs and encryption keys. Each keyword was translated from Chinese to English by a fluent Chinese speaker and accompanied with descriptions of the political and social context behind the keyword.