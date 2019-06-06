Microsoft has withdrawn a facial recognition database featuring 10 million images of 100,000 people following claims it was being used by the military and the companies in China behind the repressive surveillance in Xinjiang province.

The database, labelled MS Celeb, was published in 2016. Microsoft claimed that it was the largest facial recognition dataset in the world.

However, the individuals featured in the database had not been asked their consent. Instead, Microsoft had scraped the images from search engines and videos published online under a Creative Commons licence.

The database was taken down after it was revealed by the Financial Times. "The site was intended for academic purposes. It was run by an employee that is no longer with Microsoft and has since been removed," claimed Microsoft in a statement to the FT.

Microsoft had labelled the database ‘Celeb' to suggest that the faces were of publicly known figures, but it also included a number of private individuals, including journalists. People in the database contacted by the FT claimed no knowledge of their inclusion.

According to the FT, Microsoft's MS Celeb database had been used by a number of different companies, including IBM, Panasonic, Nvidia and Hitachi, as well as Sensetime and Megvii in China.

The latter two companies are involved in the Chinese government surveillance system installed in the province of Xinjiang where ethnic Uyghur people are closely monitored. More than one million people are believed to be held in internment camps - although the Chinese government claims that they are training centres.

It is not the only facial recognition dataset published online. Other data sets have since been removed, including one set-up by researchers at Duke University and another by Stanford University, called Brainwash.

The datasets had been discovered by Adam Harvey, who runs the Megapixels project which tracks different databases of personal information and how they are used.

Harvey warned that although Microsoft had taken the database offline, it is still being widely shared by people and groups who had downloaded it.

"People are posting it on GitHub, hosting the files on Dropbox and Baidu Cloud, so there is no way from stopping them from continuing to post it and use it for their own purposes," Harvey told the FT.

