BBC and Press Association select MarkLogic to handle Olympics data

Websites are expecting huge volume of data during the London Games

The BBC and Press Association websites have selected unstructured data-management software company MarkLogic to ensure they can handle the vast amount of data that will be created during the Olympics.

John Pomeroy, vice president of Europe at MarkLogic, said both companies approached the vendor as they were dealing with similar issues in handling the data.

"They both had a fairly archaic IT infrastructure based on out of date database technologies," he said.

Pomeroy said the companies had two main objectives.

"One was to streamline architecture, so that rather than having multiple data types stored in different repositories and trying to pull them together in some sort of user front end, they were looking to manage all of the data types in a common back end depository," he said.

"The other was streamlining processes; both of these agencies are in the real-time news business and therefore it is crucial to have anything new in the right place [on the website]. In both cases they have gone for ‘dynamic semantic publishing' in which a correspondent will tag a particular article and then the system will analyse it and decide where it will make the article visible on the site."

Pomeroy added that this mean it counts out a lot of the interaction that had to take place throughout the workflow.

John O'Donovan, director, technical architecture and development at the Press Association said that the key benefit of using MarkLogic was its ability to manage data.

"It's an XML database that stores and manages different types of content such as images and video very well," he said. "There are not that many products that are good at that. If we went for an off-the-shelf solution we would have had to spend time customising it to fit our needs."

[Turn to next page]

BBC and Press Association select MarkLogic to handle Olympics data

Websites are expecting huge volume of data during the London Games

The Press Association has integrated MarkLogic with semantic technology developer Ontotext for its BigOWLIM metadata store. The metadata store is similar to database management systems (DBMS) as they allow for storage, querying and management of structured data.

However, unlike DBMS's it uses ontologies as semantic schemata to allow them to automatically analyse the data. Another difference is that it works with flexible and generic physical data models such as graphs to allow it to adopt new ontologies.

"We're providing services to the editorial team at LOCOG using both MarkLogic and semantic technologies. The biggest problem is finding how to relate the story to the page. Semantic technology automatically helps to categorise the content that goes out so after a story is written about a particular event, rather than [staff] selecting metadata from drop-down menus, it automatically manipulates the data," O'Donovan explained.

"From LOCOG's point of view it puts the story on the right site by itself, and if you write a story that has several events, it could go up in several areas of the site."

The Press Association uses WebSphere to connect its application stack of MarkLogic, BigOWLIM and front-end interfaces, and O'Donovan said that the use of this application stack was better than a content management system (CMS) as it is less restrictive.

"A CMS often imposes some model on an organisation's content but as no two businesses are the same, it is important to understand what the business if actually for. What semantic technology does is make you think about how you want to organise your content," O'Donovan said.

The size of the task, Pomeroy said, was huge, and had to fit around other plans.

"The BBC wanted to rationalise its infrastructure by closing 200 of its 400 websites, reducing costs by 25 per cent and doubling traffic to its site," he said.

"The volume of data that is going to be generated this year is going to be huge. For example, the BBC predicts that during the 100m finals, it will service a third of the UK's internet traffic. Meanwhile, every Saturday the Press Association will get data from every football game. That is every touch of every ball in every football game; who passed it, who it was passed to, and whether the pass was completed. They will manage this source data and distribute it to all of the other press," he said.