AI companies replacing low-cost data labellers with a mix of automation and experts

As models mature, data quality has become more of a priority

As basic annotation tasks are increasingly automated, the demand for low-cost human labellers is declining sharply

Top AI companies are rapidly replacing low-cost data labellers from Africa and Asia with highly paid domain experts in pursuit to build more powerful and intelligent models.

According to a Financial Times report, firms such as Scale AI, Turing, and Toloka are leading the charge, hiring professionals in fields like biology, finance, physics, and software engineering to develop the next generation of AI training datasets.

This comes amid the rise of advanced "reasoning" models, including OpenAI's o3 and Google's Gemini 2.5, which demand increasingly complex and high-quality data to function effectively.

For years, the AI industry depended heavily on gig economy workers in countries such as Kenya and the Philippines, where data labellers were paid less than $2 per hour to perform repetitive tasks like drawing bounding boxes around images, filtering graphic content, and refining phrasing.

These workers, often under intense pressure to complete hundreds of microtasks a day, formed the invisible backbone of AI development.

Now, that backbone is being restructured.

"The AI industry was for a long time heavily focused on the models and compute, and data has always been an overseen part of AI," said Olga Megorskaya, CEO and co-founder of Dutch-based Toloka.

"Finally, [the industry] is accepting the importance of the data for training."

As basic annotation tasks are increasingly automated, the demand for low-cost human labellers is declining sharply. Instead, AI companies are investing in specialists capable of curating nuanced datasets tailored for domain-specific reasoning.

AI companies now require experts to demonstrate chain-of-thought reasoning, solve real-world problems step-by-step, and simulate complex scientific theories.

For example:

  1. A physicist might design a theoretical experiment;
  2. A software engineer would code a simulator to test it;
  3. A data scientist would analyse the output.

"The result of this is the model's not just going to be better than a physicist. It's going to be better than a superposition of somebody who's at the top in physics, computer science and data science," Jonathan Siddharth, co-founder and CEO of Turing AI, explained.

Experienced software engineers are also being asked to create domain-relevant tasks, solve them by writing and debugging code, and assess outcomes for security risks.

The change in strategy has triggered a surge in investor enthusiasm. In June, Meta poured $15 billion into Scale AI, doubling the company's valuation to $29 billion.

Earlier, in March, Turing AI raised $111 million at a $2.2 billion valuation. And in May, Jeff Bezos' personal investment firm, Bezos Expeditions, led a $72 million funding round for Toloka.

These capital injections reflect a growing belief that better data, not just better models, will be the key differentiator in the competitive AI arms race.

To recruit top-tier talent, Turing offers experts salaries 20-30% higher than their current roles. While only around 10-15% of AI budgets go to data, the sheer scale of AI investments means these sums are still "enormous", Siddharth noted.

Although demand for simpler tasks is declining, some opportunities remain available for gig workers.

According to Joan Kinyua, president of the Data Labellers Association in Kenya, local workers are now concentrating on tasks that require localised language knowledge.

In addition, some human labellers are still being assigned final quality control checks to evaluate and validate AI-generated content.