A professor of linguistics is offering a tool to boost web search accuracy
A professor of linguistics is offering a tool to boost web search accuracy

Language experts make Sense of web searching

Crystal Semantics' Sense Engine technology marries dictionary and encyclopaedia information

Written by Mark Samuels

David Crystal OBE spent three years analysing 100,000 words from the English dictionary and found an average of 2.5 meanings associated with each.

The professor of linguistics and chairman of UK-based internet search business Crystal Semantics has used this knowledge to build technology that has helped boost web search accuracy from 20 per cent to 95 per cent.

Crystal, who is honorary professor of linguistics at the University of Wales, Bangor, has published more than 100 books and was responsible for compiling an encyclopaedia of facts and figures for Cambridge University Press in the early 1990s.

As the encyclopaedia's database grew, Crystal found that he needed a way of data mining the information. His technology team began work on devising a broad classification system.

As more and more books were published, the importance of this classification system grew. Its significance developed further in the mid-to-late 1990s, as Crystal and his colleagues saw the direction of the business alter many times.

First, Cambridge University Press changed its publishing policy and sold the encyclopaedia referencing division to Dutch IT publishing house AND. Crystal found that the new owner was keen to place more emphasis on its referencing technology.

"Our remit now was something different. We had to develop the classification system as an end in itself," he explained.

AND was eager to provide a global classification for the world's information, and keen to classify the web's data. "We found it very difficult to find what we wanted," said Crystal. "So we proposed better technologies for searching."

Crystal's idea was simple: internet searching could be improved by anticipating all the important definitions associated with a single term, and by filtering internet searching according to these relevant definitions.

Between 1997 and 2000, AND spent about $8m (£4.1m) to help fund Crystal and his team of 30 lexicographers as they analysed 100,000 words in the English dictionary.

But the dotcom crash hit the innovative company hard; in March 2001, AND was forced to liquidate its UK subsidiaries, including Crystal's referencing system.

"With all this investment and skill, we realised that we had to make a go of it," he says. Crystal and his colleagues acquired AND's assets for £4m, and established their own business to exploit the potential of the classification technology.

Two sources of cash immediately became apparent. Penguin started publishing the encyclopaedia of facts that Crystal had compiled for Cambridge University Press, with Crystal himself helping the publisher license the data.

Penguin also decided to push the encyclopaedia's associated database online, and the subscription-based model was used to feed content to education groups. "That provided enough cash to keep us going," said Crystal.

The professor and his colleagues sought capital from private investors and public sources, such as the Welsh Development Agency, and raised £1m. They then turned to the commercial development of the classification system.

"We really had to start thinking everything through carefully," he explained. "In short, it's about marrying dictionary and encyclopaedia information. Once you explain the technology, people see the point of it straight away."

Unlike current search technologies, which are based solely on statistical algorithms, Crystal Semantics' Textonomy is based on the relationship between words and the contexts in which they occur.

The Sense Engine that drives Textonomy is the result of a search linguistics development programme, and has produced a collection of tools for search and navigation, e-commerce and contextual advertising. Large companies have already expressed an interest.

"We provide enabling technology, and blue chip companies in the US are looking at how these applications will give them competitive advantage," said Ian Saunders, managing director of Crystal Semantics.

Search engine providers in particular see the benefit of the company's patented technology: Crystal Semantics recently signed a classification contract with a local search provider in the UK, Yell.

"It's a crowded marketplace, but each of the major search engines is looking for differentiation," said Saunders. "There's a belief that technology is always the answer; but actually companies often don't spend enough time looking at what their users want."

How it works
The Sense Engine is the basis for four different applications from Crystal Semantics. Textonomy Reveal is a search engine assistant that works alongside categorisation tool Textonomy Select to categorise every visited website against 2,000 themes.

Building on these applications, Crystal has developed tools for advertising and e-commerce. Advertising is often placed on websites according to a trigger system, and software attempts to place adverts when key words in the text become apparent.

But the results can be hit-and-miss: Crystal recently saw an advert for gaffer tape on a football site because the story in question mentioned the word 'gaffer'. "Adverts can be dynamically related and will be entirely contextual," he said.

Crystal's e-commerce application, Textonomy Deduce, uses lexicon classification to improve business processes, and concentrates on product searching.

An individual searching a shop's online catalogue with the term 'mobile phone' or 'cell phone' may find no results because the system relies on a more precise description, such as 'cellular phone'.

Three years of analysing the associated meanings of 100,000 words in the English dictionary mean that customer searches can be more relevant and direct.

Tags:

reader comments

related articles

Smaller engines often more successful at generating click-throughs

Biggest not always best for search engines

Advertisers could get more results from smaller search firms 02 Mar 2005

 

related whitepapers

today's top stories

IT's stock is soaring at the LSE

London Stock Exchange IT chief David Lester explains to Angelica Mari how the integration of Borsa Italiana is keeping his team busy, despite the worsening economy 20 Nov 2008

Keeping IT in fashion

John Bovill has been hooked on retail since his early years as a fashion market trader. His industry knowledge is now helping him build a slick IT operation, reports Charlotte Moore 20 Nov 2008

Cutting-edge IT delivers the goods

Chief technology officer Jay Bregman explains how constant innovation is part and parcel of his strategy for delivering competitive advantage at eCourier 20 Nov 2008

Computing podcast: Europol's data sharing woes; credit card protection at Cotton Traders

The pan-European fight against organised crime is undermined by lax data sharing arrangements; and Cotton Traders enhances its credit card protection 20 Nov 2008

Keeping IT on track

Catherine Doran, winner of Computing’s IT Leader of the Year award, tells Angelica Mari of her determination to drive on with technology-led transformation at Network Rail despite uncertainty over funding 19 Nov 2008

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

Existing User

Newsletter user login:

Advertisement

Jobs

Related jobs

Job of the week

Job alerts

Sign up here

Find your next job

IT Salary Checker

Check salary here

Advertisement

White papers

Search white papers

Top categories

VPN, Extranet and Intranet Solutions

WAN/ LAN Solutions

Network Security

Interoperability-Connectivity

Grid/ Utility Computing

Latest poll

Will attempts to rebrand IT as a "cool" choice of profession increase the number of IT graduates?

Will attempts to rebrand IT as a "cool" choice of profession increase the number of IT graduates?

Can brand building reverse a decline in IT graduate numbers?

Previous poll results

Latest audio and video articles

Video

The definitive guide to converged communications

Five key trends and five best practice tips to help you improve your corporate communications 20 Nov 2008

PodcastAudio

Computing podcast: Europol's data sharing woes; credit card protection at Cotton Traders

The pan-European fight against organised crime is undermined by lax data sharing arrangements; and Cotton Traders enhances its credit card protection 20 Nov 2008

Latest in-depth articles

StarFeatures

Retaining the stars of IT

Jim Mortleman investigates the innovative techniques IT leaders are using to hang on to their star performers 20 Nov 2008

Dave BaileyComment

Clouds darken outlook for Vista's successor

Windows 7 looks like being an improvement on Vista, but economic and environmental concerns may mean few enterprises will rush to adopt it 20 Nov 2008

Advertisement

Primary Navigation