Microsoft withdraws facial recognition database of 100,000 people

clock • 3 min read

'MS Celeb' database had scraped images and videos published online under a Creative Commons open-source licence

Microsoft has withdrawn a facial recognition database featuring 10 million images of 100,000 people following claims it was being used by the military and the companies in China behind the repressive surveillance in Xinjiang province.

The database, labelled MS Celeb, was published in 2016. Microsoft claimed that it was the largest facial recognition dataset in the world.

However, the individuals featured in the database had not been asked their consent. Instead, Microsoft had scraped the images from search engines and videos published online under a Creative Commons licence.

AI & Machine Learning Live is returning to London on 3rd July 2019. Hear from the Met Office's Charles Ewen, AutoTrader lead data scientist Dr David Hoyle and the BBC's Noriko Matsuoka, among many others. Attendance is free to qualifying IT leaders and senior IT pros, but places are limited, so reserve yours now.

The database was taken down after it was revealed by the Financial Times. "The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed," claimed Microsoft in a statement to the FT.

Microsoft had labelled the database ‘Celeb' to suggest that the faces were of publicly known figures, but it also included a number of private individuals, including journalists. People in the database contacted by the FT claimed no knowledge of their inclusion.

According to the FT, Microsoft's MS Celeb database had been used by a number of different companies, including IBM, Panasonic, Nvidia and Hitachi, as well as Sensetime and Megvii in China.

The latter two companies are involved in the Chinese government surveillance system installed in the province of Xinjiang where ethnic Uyghur people are closely monitored. More than one million people are believed to be held in internment camps - although the Chinese government claims that they are training centres.

It is not the only facial recognition dataset published online. Other data sets have since been removed, including one set-up by researchers at Duke University and another by Stanford University, called Brainwash.

The datasets had been discovered by Adam Harvey, who runs the Megapixels project which tracks different databases of personal information and how they are used.

Harvey warned that although Microsoft had taken the database offline, it is still being widely shared by people and groups who had downloaded it.

"People are posting it on GitHub, hosting the files on Dropbox and Baidu Cloud, so there is no way from stopping them from continuing to post it and use it for their own purposes," Harvey told the FT.

Delta is a new market intelligence service from Computing to help CIOs and other IT decision makers make smarter purchasing decisions - decisions informed by the knowledge and experience of other CIOs and IT decision makers. 

Delta is free from vendor sponsorship or influence of any kind, and is guided by a steering committee of well-known CIOs, such as Charles Ewen, Christina Scott, Steve Capper and Laura Meyer. 

Ten crucial technology areas are already covered at launch, with more data appearing and more areas being covered every week. Sign-up here for your free trial of the Computing Delta website.

You may also like


Avast, Check Point, McAfee, Symantec and Malwarebytes rush to build defences against Chinese government smartphone spyware

clock 05 July 2019 • 2 min read


Huawei supplies surveillance technology to China's government in Xinjiang, where one million people are interned for 're-education'

clock 16 May 2019 • 4 min read


Here they are again, our most popular stories from the past seven days

clock 13 November 2015 •

Sign up to our newsletter

The best news, stories, features and photos from the day in one perfectly formed email.

More on Software

The social engineering of the self: How AI chatbots manipulate our thinking

The social engineering of the self: How AI chatbots manipulate our thinking

We need structured public feedback to better understand the risks, says red teamer Rumman Chowdhury

John Leonard
clock 27 October 2023 • 4 min read
AI doesn't care what you think

AI doesn't care what you think

Want to understand hallucinations? Look at your family

Professor Peter Cochrane
clock 26 October 2023 • 3 min read
IT Essentials: The fungal IT network

IT Essentials: The fungal IT network

Shadow IT grows best in darkness and solitude

Tom Allen
clock 16 October 2023 • 2 min read