NLP to increase diversity in text analysis

clock • 2 min read
Felicia Ziparo, Methods Analytics

Felicia Ziparo, Methods Analytics

More and more articles, blogs and videos mention natural language processing (NLP) as a tool to get information from vast amount of text. What they rarely mention is that NLP can also be used to increase diversity and reduce the bias of text analysis.

NLP is a branch of data science that enables automated processes to analyse and extract meaningful insights from human language. It can be used to supplement the manual processing, drawing out insights that might otherwise have been missed, and reducing some of the manual processing that doesn't add any value. If this is done well, it could reduce the cost of many operations, while improving the quality of the outcomes. 

Our white paper Gaining Greater Insights from Public Consultations with data Science & NLP explores how data science techniques can be applied to public consultations, with a particular focus on how this can help humans get more information out of a time expensive task, while reducing bias in the analysis. As a practical application, techniques like Topic Modelling and Named Entity Recognition can help the reader to extract the main themes present in the text, while highlighting the context they have been used in. Named Entity Linking enables linking given entities to a knowledge graph to acquire additional information such as definitions, aliases and conceptual categories. This also gives entities context by creating connections and associations, while accounting for permutations and synonyms. A reduction of bias can be evident from this process: contribution from individuals who use less frequent terms or keywords, would be considered in the analysis. If we want to go further, we could learn from opinions and text that are normally excluded because they do not make the set frequency threshold, making sure these are addressed, if relevant. This would ensure more voices are heard, generating fairer and more in-depth results than has previously been possible using traditional technologies and techniques.  

Keywords, organisations, and people the public cite, along with public sentiments in responses to open questions could change across different demographic, economic and geographic groups. By better knowing the data, it is possible to account for any possible underrepresentation when building a model and to test the algorithm, checking that minorities are not affected. Standardising the text and reducing the human bias, can help increase diversity in consultation responses while improving policies and government replies.

NLP has also been used to reduce bias in other fields, for example by matching skills for the recruitment process in the US. While this topic can be controversial, the aim is to use NLP to standardise skills in CVs and successfully match people from different background to open positions. Testing the algorithm is certainly a crucial aspect of this process, making sure that this tool is not used to make automatic decisions on candidates.

As the above example shows, ethics plays an important role in the field of AI. I am a great advocate of AI being used to augment the human processes, rather than replacing them, allowing for greater scrutiny of feedback, rather than less.



Felicia Ziparo is Lead Data Scientist at Methods Analytics and finalist in Team Leader of the Year category at the upcoming Women in Technology Excellence Awards. 

Sign up to our newsletter

The best news, stories, features and photos from the day in one perfectly formed email.

More on Software

The social engineering of the self: How AI chatbots manipulate our thinking

The social engineering of the self: How AI chatbots manipulate our thinking

We need structured public feedback to better understand the risks, says red teamer Rumman Chowdhury

John Leonard
clock 27 October 2023 • 4 min read
AI doesn't care what you think

AI doesn't care what you think

Want to understand hallucinations? Look at your family

Professor Peter Cochrane
clock 26 October 2023 • 3 min read
IT Essentials: The fungal IT network

IT Essentials: The fungal IT network

Shadow IT grows best in darkness and solitude

Tom Allen
clock 16 October 2023 • 2 min read