Connecting the dots at HMRC

clock • 7 min read

When HMRC was formed in 2005, its fraud detection systems were arguably far from cutting edge. However, a new system that can work with more data than ever - from different tax systems - is paying for itself many times over

When HMRC's data analytics team stepped up to collect their award for the Best Big Data project at the UK IT Industry Awards in November for its Connect data warehousing and analysis project, it was the culmination of an initiative conceived as far back as 2005, when the Inland Revenue was merged with Customs & Excise.

The merger had been the long-overdue catalyst for a rethink of the way in which both organisations handled fraud detection. While the kind of data warehouse-based fraud detection systems commonly used in the banking industry were not unknown to the taxman, the implementations had been heavily siloed.

For example, there was a system for detecting VAT fraud, but focused solely on data drawn from the VAT system; likewise, there was a separate system for analysing self-assessment tax returns. Yet, key markers for fraud might come from anywhere, such as a mismatch in the figures a company may provide for its corporate tax return compared to the trading it claimed underpinned its VAT return.

"There were numerous systems in use, but none of them was integrated," says Mike Hainey, head of the risk and intelligence service data analytics team at HMRC. "To ‘risk assess' on a broader front, you would have to dip in and out of these marts and have some very skilled people make the joins between the information sets. That was not very effective," he says.

Furthermore, there are many much more subtle markers for fraudulent activity that simply couldn't be pursued because of the inability of staff in the Enforcement and Compliance department to be able to simply "play" with the data. It could take weeks or months for specialist computer staff to put together the data mart to enable compliance staff to investigate a new area, or in a new way.

As a result, HMRC was unable to get a truly rounded picture of individual taxpayers or businesses - a genuine, single "view" of the "customer".

The all-seeing eye

The merger of the two revenue-collecting departments offered the opportunity to put together a business case for a more sophisticated approach, using the latest data warehousing technology and analysis tools.

First, the organisation put together a group made up of people from both "sides" of HMRC, as well as Aspire, the organisation that runs HMRC's outsourced IT, and services company Detica.

It analysed the software options on the market before approaching vendors to conduct a pilot phase. This involved delving more deeply into the data for a particular - unnamed - county, analysing data around both individuals and companies to assess how effective it might be country-wide.

The aim was to identify the real-world "views" that would enable HMRC to make most sense of the data from a fraud and evasion point of view. Entities were put together so that the data could be analysed in different ways. So, says Hainey, an individual would be one entity, a family would be another entity, and a company another.

"So we identified real-world entities in which data clusters around, and then looked at the commonality in those areas that link those entities together," he says. 

From that, it would be much easier to extrapolate someone who was the director of a number of companies, his family connections and, say, the companies that his wife is director of, as well as any family trusts, too - the data, in other words, could be clustered around these entities.

"We started to see low-hanging fruit early on in certain areas and in terms of spotting certain trends and patterns," says Hainey. "That made us realise that it was actually delivering a quality product in the area of spotting fraud indicators that we previously hadn't seen because we were suddenly aligning other data sets that, combined, was telling us a different story."

The pilot project was not only a success, he adds, it highlighted a number of areas where HMRC could achieve an almost immediate yield. Indeed, what was learnt in the pilots was almost immediately fed back into the "business" so that action could be taken.

It also helped to build a strong business case not only for the system, but for the ongoing running costs that would be incurred. This had to be put to an internal investment committee to get the business case approved.

The front-end of the system comprises what HMRC calls the Integrated Compliance Environment (ICE), a graphical tool from Detica that enables investigators to put together information around entities, and the Analytical Compliance Environment (ACE) that analysts and statisticians use to put together risk profiles and interrogate large volumes of data.

At the back-end, SAS Institute provides the data warehousing, while DAN - the Data Acquisition and Networking system - provides the extraction, transformation and loading (ETL) capabilities for taking the data items in different formats and transforming them into a structure that the data warehouse can store and make available for analysis.

A feast of data

In addition to being able to get a single view of "the customer", the system is able to incorporate a wide range of data from other sources too, says Hainey. "It's departmental data at one end of the spectrum, commercial data - bought-in information."

You may also like
Embrace change at the 2024 UK IT Industry Awards

Leadership

The industry's best and biggest night of the year makes its triumphant return

clock 14 March 2024 • 1 min read
Video: KOcycle, Special Sustainability winner at UK IT Industry Awards 2023

Leadership

Founder says award reflects the increasing importance of tech sustainability

clock 16 November 2023 • 1 min read
Video: Access Group, Vendor of the Year at UK IT Industry Awards 2023

Leadership

'The investment and hard work really does pay off'

clock 15 November 2023 • 1 min read

More on Business Software

Harnessing the 'irrational exhuberance' around AI - CNCF's Priyanka Sharma

Harnessing the 'irrational exhuberance' around AI - CNCF's Priyanka Sharma

CNCF chief on meeting the sky high expecations of genAI

John Leonard
clock 20 March 2024 • 4 min read
Greggs forced to close stores due to IT glitch

Greggs forced to close stores due to IT glitch

Some stores unable to process payments

Penny Horwood
clock 20 March 2024 • 2 min read
Oracle promises AI everywhere at Cloudworld 2024

Oracle promises AI everywhere at Cloudworld 2024

Integrated across the Fusion Suite

Tom Allen
clock 19 March 2024 • 6 min read