06 Mar 1997
It used to be said of universities that their research was on track if it encompassed three elements - the serious, the speculative and the downright nutty.
In these competitive, cash-conscious days, the big IT R&D labs tend to be highly 'value focused'. Researchers are given their heads only when their bosses can see a marketable end in sight. The nutty is out, but that still leaves the serious and the speculative, although some would argue that there is no such thing as truly speculative work without projects that any sane holder of the R&D purse-strings would instantly consign to the bin.
BT Labs at Martleheath, near Ipswich, is one of the best endowed labs in the UK. Home to several thousand staff, this laboratory is doing work that is likely to have a major impact on homes and businesses throughout the UK, once BT is free to offer online services.
Work from BT Labs has, of course, already made an impact. In common with major telco providers around the globe, BT has a keen interest in interactive speech technology and its usefulness in a range of voice-activated public and business services. BT Labs is also investigating products and areas as diverse as gesture recognition, interactive TV and an encryption system based on quantum mechanics.
Speech recognition is one of the lab's longest-running projects. Simon Ringland, manager of speech recognition research at BT Labs, says work began in this field a year or two before he arrived a decade ago. BT's call minder service, rolled out in May 1995, was one of the first major products to emerge from this research. 'This service is wholly based on speech recognition and it already has between half a million and a million registered users,' Ringland says.
To anyone leaving a message on the system, call minder appears to be like any ordinary voice recorded answerphone service. But when subscribers want to hear messages left on their number, they go through a speech recognition 'walk-through', or menu, to have messages replayed. It sounds simple enough until you realise that this is a national service that has to embrace the huge range of UK accents.
'Right now, the strongest accents might defeat the system, but it works for the vast majority of people,' Ringland says. Staff have put a great deal of work into understanding and engineering the flow of dialogue between the system's speech recogniser and end users. The speech recogniser now has the ability to recover from problems, to prompt the user for a clearer message and to convey what it expects from the user at any point in the dialogue.
Going beyond this automated national answerphone service, BT Labs is currently working on a corporate directory service based on speech recognition.
'The idea is that you should be able to phone the company and have the system ask you for the name of the person you want to speak to - which means resolving problems like which Mr Smith is the right one for you,' Ringland says.
But it is not yet time for all the country's receptionists to start looking for alternative employment. According to Ringland, speech recognition is still hampered, not by a shortage of raw computing power, but by our lack of understanding of the fundamental human processes involved in the act of listening and understanding.
The next step, he says, is natural language speech recognition, where the system is able to respond appropriately to complex sentences such as: 'Can you put me through to Peter please?', or to diary clarification requests such as 'What do I have on next Wednesday?' When the researchers crack that problem it may be the turn of secretaries to follow receptionists out of the door. In the US, Ringland points out, the telco giant AT&T claims to have just recently saved its first billion dollars using speech recognition.
People interact not only by speaking, of course, but through body language and gestures too. BT Labs has been working on gesture recognition in conjunction with one of the best-known US research establishments, the Massachusettes Institute of Technology (MIT).
Phil Sheppard is group leader of what BT Labs calls 'natural communication systems'. The most likely first real-life application for a gesture recognition-enabled system, Sheppard says, is virtual conferencing.
One of the difficulties of standard video conferencing is that once there are more than a handful of participants, spontaneous conversations between subsets become impractical since it becomes very difficult to tell who is talking to whom. The answer, according to Sheppard, is to give every participant an avatar and have them meet in a 3D virtual setting. Here the avatars can sidle up to one another, engage in eye contact, make 'over here' gestures, lean seductively against walls or float off and find someone else to talk to.
A working instance of this type of system was The Mirror - an interactive 3D world being put through its paces by a joint BT, BBC, Illuminations Television (producer of The Net) and Sony team. The avatars in this virtual world are still extremely simple, however, and BT's researchers are keen to go further.
'We are working on avatars that you can control in a natural way, using the full range of body cues that you would display in life,' Sheppard says. The idea is that a video camera records the user's facial expression - whether the mouth is smiling, for example, or the eyebrows raised in surprise. These gestures are interpreted by the system, which then generates the appropriate animation in the avatar. The goal is to approach the kind of emotional expressiveness films such as Toy Story have accustomed us to expect from virtual characters.
The gesture-tracking system has come out of a piece of work by MIT Labs, with BT sponsorship. 'Unlike virtual reality systems, there is no physical connection between the individual and the avatar. We can now create a very realistic model of the user's head and map his or her facial expression directly to that model. This generates interaction that will feel like normal conversation,' he says.
According to Sheppard, BT Labs has already held virtual conferences involving several hundred people using simple avatars. 'Right now the applications tend to be games, but the virtual conferencing aspect of all this gives it a serious business dimension as well,' he notes.
After a visit to Olivetti's research laboratory in Cambridge last year, senior executives from Oracle were so impressed that the company decided to throw its weight behind the lab's activities. Neither company is disclosing the precise sum Oracle has invested, but it was sufficient to allow Olivetti's vice president of research and Cambridge University reader Andy Hopper to double the lab's staff.
Now operating under its new name, ORL (Oracle & Olivetti Research Laboratory) has been at the forefront of research designed to make the NC a reality.
This interest in an NC is not entirely coincidental. ORL claims to have pioneered ATM, spinning off the technology some time ago in ATM Ltd, which is now part of the Olivetti Telemedia Group, a collection of companies with communications interests.
The role of the lab, according to Olivetti, is to get new technology into the sort of shape that will enable it to be spun off into a successful business in its own right.
Another instance of this is Telemedia Systems, which offers a concept pioneered by ORL known as corporate TV. The idea is to take an ordinary NT server and add multimedia capabilities with a product called C-Stream NT, turning it into a video-on-demand server. 'We are concerned with all aspects of multimedia,' Hopper explains. 'Our basic idea is to look at ways of enabling the growth of the next wave of computing by making things easier and more manageable for the end user.'
The NC is a crucial concept, he believes, because it takes back the management of the technology and leaves the user free to deploy a wide range of intuitively simple applications, pumped down to the NC from powerful centrally administered servers.
The technology that will drive these 'simple' applications, including video mail, speech recognition and the like, depends on what Hopper calls a highly theoretical use of streaming. His demonstration of the NC uses 'down to the desktop' ATM, which gives him the bandwidth to hold the NC on his lap and click at will between real-time TV, video calls to colleagues and dozens of other applications.
'We are investigating the use of server clusters to complement the NC.
When you have systems that have to support many users, all with NCs that do not have much computing power, then it raises interesting hardware and interconnect issues. This is a challenge for us and we want to achieve a highly economical solution,' Hopper says.
He argues that although the world's first 200 million end-user systems have all been PCs, the vast majority of the next 800 million devices will probably be NCs. 'The PC world is becoming more and more difficult. If computing really is going to reach a billion people and more, things are going to have to get simpler, not more complicated,' he says.
Recognising that a large subset of those 800 million devices will be in the hands, pockets or briefcases of mobile users, ORL is also putting a good deal of effort into developing wireless or radio ATM, otherwise known as RATM.
The key benefits of standard ATM, namely dynamic bandwidth assignment and sharing, plus its predictability and scalability, apply to RATM as well, and make it the ideal vehicle, in Hopper's eyes, for a whole new generation of video on the move applications.
'The proposition that one can only use video conferencing from fixed terminals will seem very strange in a few years time,' Hopper claims.
RATM will be a powerful competitor to today's in-building wireless LANs, which use spread spectrum techniques and support Ethernet type traffic at relatively low bit-rates.
ORL already has a prototype RATM system which operates in the 2.45GHz ISM band, and has the advantage of being license-free at low power levels.
This provides a data rate of 10Mbps, which enables staff to use mobiles based on the Advanced Risc Machine running compressed video streams. The network is reliant on pico-cells, or small areas of effectivity, serviced by their own base station and interconnected via the lab's wired ATM network.
Although it might be difficult to try to replicate the architecture nationwide, the key point is that the wireless data cells are much the same as the fixed ATM data cells. The major differences, according to ORL's John Porter, is a preamble added to the front of the header, which enables the receiver to lock on, and a checksum that is added to the back of the payload which helps to maintain quality of service.
One potential problem is that in order for the checksum approach to work, the receiver has to signal a valid acceptance to the transmitter for that chunk of data. This makes for a good deal of signal interchange on-the-fly between the two. However, the ATM data format is designed to allow large data throughputs to be dealt with at the hardware level, leaving the processor free to get on with management tasks.
ORL is confident that it can get physical RATM components for the mobile down to the size of PC Cards. One problem caused by multimedia streams is that, before too many years have passed, organisations are going to find their storage systems cluttered up with material that does not fit easily into today's predominantly text-based search-and-retrieval systems.
There is no easy way, for example, to tell your PC to go off and find you that bit of the video conversation you had with Client A where he seemed to agree to a fee increase of 15% - unless you can remember the file name that you assigned to the clip.
Retrieval by specific file name gets harder the more you work with multimedia, and if the system is assigning file names by date or by random characters, as in a browser cache, for example, then retrieval soon ceases to be worth the effort.
ORL's solution has been to use sophisticated speech recognition algorithms to search on the audio parts of stored video clips. Users, for example, can tell the system to find all video clips with the words 'fee increase', or the phrase, 'sure, charge what you like'.
Despite the best efforts of the labs, however, it will be a few years before pattern recognition and speech recognition mature and merge to the point where you can tell your system to replay the video conversation you had with that bloke with the enormous eyebrows and the face like a squashed pomegranate.
Have your say on this article
Newsletters
Latest stories from Management
Latest videos
You may also like
Management jobs
Technology Patent Wars
Case studies from large organisations across all sectors
... And rich media, and flexible working, and peaks in traffic ...
Upcoming Events
Join us for this Computing web seminar, in which the Head of BI at the Co-operative Group Nick Colebourn will be explaining just how he reigned in the Group’s sprawling database estate and how significant savings were realised and data quality improved as a result.
Date: 31 May 2012
Time: 11:00 AM
Live June 13th 11:00am: Register now. During this web seminar we will be looking at the sorts of incidents that can bring data centres grinding to a halt and what can be done about them.
Date: 13 Jun 2012
Time: 11:00 am
Receive the latest jobs direct to your inbox
Are you being paid what you are worth?