Interview: Get smarter with data

27 Nov 2006

Be the first to comment

A Computing logo
IBM logo

IT Week: As chief scientist of the IBM Entity Analytic Solutions group, can you explain what semantic reconciliation is?

Jeff Jonas: It means to determine when two objects are the same despite having been described differently. When applied to people, it’s called identity resolution. For example, one record might say Bob Smith and one might say Robert Smith, but our technology can figure out whether it is the same person or not by comparing things

Further reading

like the address, telephone numbers, dates of birth, etc. It also figures out if people are related by checking whether they share the same address or phone number, for example, which can be very useful if you’re hunting for the bad guys.

You have also coined the term “perpetual analytics”. What does this involve?

With perpetual analytics, the data finds the data, and the relevance finds the user. For example, a bank may catch and prosecute a fraudster but fail to find the insider who collaborated with him. Six months later that employee goes to HR and changes his address on the payroll system. If that address is the same as that involved in the fraud case, then our service will flag it up.

In most organisations currently, each operational system is like a silo. This means organisations do not know what they know: they cannot see horizontally across all the silos to understand the full context. And without the full context, decisions are made based on incomplete information, resulting in corporate amnesia. With our service, every time that key data changes in the enterprise, it is handed to our system. This new data is then analysed with reference to a database of all previous observations, and anything suspicious is flagged up.

Who could use this service?

Any organisation that has a lot of data about identities scattered across multiple enterprise databases. For example, perpetual analytics has been used to measure organised retail theft, which is a $10bn-plus problem where criminal enterprises recruit store employees to steal certain products, which they then repackage and sell on. Before then, the scope of the problem was not really appreciated.

How is the data kept secure?

Any time an organisation transfers data between systems there is a risk of unintended disclosure. US data breach notification laws mean such incidents can hit the news and quickly damage a firm’s profits, brand and reputation. We created a technique that mathematically shreds data so it is non-human-readable and non-recognisable but is still usable for matching.

Encryption is OK, but there is the risk of the insider threat once it has been decrypted. With our technique, all the analysis is done while the data is encrypted and if there is a match, all that it says is there are two records in common and gives the pointers back to the original data. The original data holders keep and then authorise each and every data release. Many in the privacy community see it as a useful step forward.

Reader comments

Have your say on this article

All fields required. Your email address will not be displayed on the site.

By submitting a comment you agree to abide by our Terms & Conditions

  • Digg
  • Tweet

Newsletters

Sign up for our FREE newsletters

Technology Patent Wars

Large companies such as Microsoft, Facebook and Google have been hoovering up technology patents recently. Is this stifling innovation?

88 %

5 %

7 %