The relational database - that old workhorse of the IT world - is looking a bit tired. Not that it's ready to be put out to pasture just yet; there are plenty of ways of helping the old beast continue to take the strain of IT.
It's certainly a popular animal. Sales of relational databases are worth $12bn (£7.4bn) a year worldwide, according to IBM, and the profits have made the fortunes of some of the best-known names in the industry.
Numerous service companies present hefty bills to users for relational database implementations. At the same time, an army of application software vendors, led by SAP, rake in the revenues with applications relying entirely on relational database technology.
The big picture is that Oracle has about 40 per cent of the world market, and falling; IBM slightly more than 30 per cent and rising; Microsoft just over 10 per cent and rising; with Sybase stuck at under five per cent, according to researcher IDC.
That's the supply side. But what about the users? Here, things aren't as rosy. Many banks scream - off the record, of course - that they no longer have time to carry out their nightly reconciliation runs with which they resolve all the transactions they've performed in the day.
Banks are running out of time in the batch window between the close of transactions on one day and the influx of transactions on the next.
Many traders of foreign exchange, shares, bonds and commodities have long since abandoned the relational database for alternative technologies.
To take advantage of slight market movements they need fast analysis of tons of data, and relational databases on their own are just not fast enough.
The same is true for telecoms service vendors, such as Vodafone, which have to deal with storing and, just as important, retrieving rapidly increasing mountains of data.
Even enterprise retail planning application is groaning with the limitations of relational databases. No less than 16,500 relational database tables underpin a single SAP R3 application, according to the vendor.
The sheer size of the logical structures beneath the application shows how the intrinsic elegance of the relational database, as envisaged by its early proponents, is being lost in the murky world of IT applications.
Database origins
Now is an appropriate time to examine the role of the relational database, and the alternatives, especially as we are poised between two significant events.
In April this year, Edgar (Ted) Codd, the creator of the relational database, died aged 79. Next year will be the 40th anniversary of the coining of the term 'database'.
It was first used for military applications in the early 1960s which were generating, for that time, huge amounts of data: whole kilobytes full of it in one place on one disk drive.
The commercial world in the early 1960s was enamoured with the idea of 'integrated data processing'. It was less an idea, more a Holy Grail for which the search continues.
The idea of a database (or data base as it was then written) was a fascinating addition to that search. Codd's relational database ideas formed in the late 1960s and became the fourth type of database.
Each generation of database technology tried to solve the problems of the last, and each consists of two parts: a way of organising the data; and an implementation to get data in and out of this form of organisation. In short, how the data is organised and how this organisation is implemented.
Each generation has been labelled by the way it organises, or sees, the data it is to store, with each term carrying the impression of how the data is to be seen. Flat files, hierarchic and network are the three generations preceding relational.
Codd, being a mathematician, saw data as tables with rows and columns. He was recruited by IBM after gaining a degree from Oxford in mathematics and chemistry, and a stint in the RAF in the Second World War. His book explaining the relational model was dedicated to his comrades in the RAF.
Until Codd's breakthrough with the relational model in the 1970s, developers had to know the physical structure of the data. They had to know, and keep track of, the disk addresses for storage.
Codd's mission was to release them from this burden, so that developers could focus on the logical structure of the data.
As he wrote when introducing the relational model: "Future users of large data banks must be protected from having to know how the data is organised in the machine."
His relational approach was, at the time, elegant, compact and aimed directly at the problem developers faced.
IT historian Martin Campbell-Kelly explains the attraction of the relational approach: "It was the gift of eliminating database navigation so that SQL queries were independent of the actual data structures."
Duncan Pauly, chief technology officer at database software start-up CopperEye, added: "When the relational database first came about, people got a warm feeling because it was pure and elegant."
To implement a relational database, developers have to arrange the data in special tables. This arrangement is called the third normal form, where redundant and repeated groups of information are removed to devise clear and workable tables.
It is like a calculus for data handling, the product of a mathematically trained mind looking for simplicity and elegance.
"Developers adopted a relational view of the world. It was a neat and straightforward approach with nice rules to follow," explained Pauly.
Breaking the calculus
But the problem today is that you can't get high performance out of relational databases unless you start breaking Codd's calculus and arranging data in such a way that the purity and elegance of Codd's form is broken.
Developers soon realised that they needed some additional assistance with extra data indexing before they could deliver the performance required.
For example, if developers of data warehouses are to deliver the information, they deliberately arrange it in what is known as a 'denormalised' structure.
"A lot of data is repeated. It's the only way to get performance," said Pauly. So it's almost mandatory to break Codd's rules to get the performance required.
Relational database giant Oracle agrees. "What we regard as the relational database now is very different from its pure form," said Tim Payne, senior European director of technology marketing at the company.
"The modern relational database product can support XML, perform FTP services, send and receive email. This is a very long way from Codd's purity and elegance."
But developers have to do this if the relational database is to be preserved. The model has to be amended and extended just to keep the workhorse going.
There are only two alternatives to this sullying of Codd's pure structure: brute force or intelligence. The first involves throwing hardware at the problem.
The second means ditching the relational database and replacing it with another pure and elegant form to handle the 21st century's data storage requirements. Both options are in evidence today.
Hardware choices
Which hardware architecture to use is a point of debate between relational database vendors.
Oracle is interested in racks of Intel blade servers on clusters of processors and hard disk storage devices, and has rewritten parts of its core product to exploit this technology.
Microsoft, according to Oracle, is going up the blind alley of SMP architectures, which Oracle claims will run out of capacity.
"SMP will be alright for a few niche applications but not for the majority of customers," warned Payne.
The latest hardware is grid/utility computing. Grid computing gives seemingly endless increases in raw power and storage capacity with which to mop up the growing technical inefficiencies of relational database software. No wonder the big database vendors, especially IBM, are interested.
We couldn't have relational databases at all without the humble and often overlooked hard disk. It's more than a coincidence that the term database and the removable hard disk shuffled onto the stage of history at about the same time.
The first removable hard disk product was shipped by IBM in 1963, the year before the term 'data base' was coined.
The first removable hard disk stored two million characters on six 13-inch disks. Two years later and the removable disk offered 7.5 million characters.
Three years after that, in 1968, a host of plug-compatible vendors offered their 'plug-and-play' alternatives for IBM removable hard disk users.
These plug-compatible vendors created a lively and entertaining market as IBM fought Memorex, Control Data, Bendix, General Electric and others for the pot of gold.
The user was the beneficiary of this battle, with ever cheaper hardware storage the result. The lawyers benefited too, because IBM's dominance and its counter attacks triggered a series of anti-competition lawsuits on both sides of the Atlantic.
The brute force of ever larger and faster hardware has always helped the performance of databases. But specialist hardware has seldom enjoyed universal success.
"Dedicated hardware for a data store is inherently less flexible than a software implementation," explained Simon Williams, database specialist and designer of a new generation of databases.
The answer, for the supporters of the brute force approach, seems to be to use cheap and cheerful hardware, as opposed to the elegance of specialist hardware.
Uncharted territory
The other option is to reject the relational model and strike out in a new direction. Williams has done just that. He has built an associative model of data, written about it, formed a company to sell it, and implemented it in a product.
Codd freed the developer from the physical structure; the next breakthrough, according to Williams, is to free the developer from knowing the logical structure.
There are others who want to break out of Codd's approach. A select group of start-ups is pushing the notion of XML stores as the next big thing in the world of databases, with the growing popularity of XML giving these vendors some wind in their sails.
Yet we've been here before. The current push behind XML databases is nothing compared with the fervour behind object-oriented databases in the early 1990s.
By the middle of the decade, one market research company predicted from the vantage point of 1990 that object-oriented databases would account for half of all database sales.
It just didn't happen. Pioneering users often found that they could put a lot of data into object-oriented databases, but had a hard time getting it out.
Williams explained that object orientation is a programming technique to remove the largest source of bugs in software code when one programme overwrites the memory area of another.
It's a fallacy to think that what's good for memory is good for a database, he said, adding that only when complex structures such as CAD/CAM models need to be stored on disk is it worth the effort of object databases.
In any case, the relational vendors extended the features of their databases to handle objects. "We invested a lot to get object data types and support for object methods in the core kernel," said Payne.
There are choices beyond objects. The data could be put in columns rather than the rows of Codd's model. Pauly describes this as "an amazing leap in thought".
Sybase has an implementation but, with less than five per cent of the market, it hardly has the clout to push the issue.
As for XLM stores, Payne claims that early adopters often use them for pilots, but migrate to the more mature relational model for the actual implementation.
Williams has a fundamental concern about XML stores: XML is a grammar, not a language, and is more similar to a document store, rather than to a database. "It's chalk and cheese," he argued. "An XML store isn't scalable and doesn't have a coherent data model."
The relational database workhorse is, indeed, a bit tired. But there are only three alternatives: pollute Codd's pure image of the relational database; use the brute force of more hardware power; or adopt a different model.
So far, déjà vu tells us that pollution and brute force are the best short-term options.
MORE INFORMATION:
Ted Codd's obituary and the contribution he made to database technology is outlined on IBM's website here.
CopperEye's approach, adaptive addressing, devised by Duncan Pauly, is on www.coppereye.com.
Simon Williams's The Associative Model of Data, published by Lazy Software, offers an insight on understanding the structure and limitations of the relational model, and an introduction to his associative model approach. A white paper can be downloaded here.
The commercial history of database companies, mostly with a US slant, is in Campbell-Kelly's book, A History of the Software Industry, from MIT.





reader comments