Google has re-released an open source version of optical character recognition software originally produced by HP
Google uses OCR to convert documents into text that can be used for indexing

Google re-releases open source OCR software

Tesseract code unearthed from the HP crypt

Written by Matt Chapman

Google has re-released an open source version of optical character recognition (OCR) software originally produced by HP

The Tesseract program was developed by HP between 1985 and 1995 and in its final year was in the top three OCR packages in a competition organised by the University of Las Vegas (UNLV) in Nevada. 

Google said in a statement that, although some people might wonder why the search giant was interested in OCR technology, it fitted in with the company's plans to make information available online.

"We are all about making information available to users, and when this information is in a paper document, OCR is the process by which we can convert the pages of this document into text that can then be used for indexing," said Eric Case on the official Google Code blog

HP stopped working on Tesseract in 1995 and released the code to the Information Science Research Institute at UNLV a couple of years ago so that it could be developed for open source. 

"UNLV was happy to oblige, but they asked for our help in fixing a few bugs that had crept in since 1995 (ever heard of bit rot?)," wrote Case.

"We tracked down the most obvious ones and decided a couple of months ago that Tesseract OCR was stable enough to be re-released as open source."

Google originally chose to keep the launch low-profile but today's announcement includes an advert for engineers to work on the project

The software currently supports only English, does not include a page layout analysis module, struggles with greyscale and colour documents, and will not match the accuracy of the best commercial OCR packages currently available.

"Yet, as far as we know, despite its shortcomings, Tesseract is far more accurate than any other open source OCR package out there," wrote Case.

Tags:

reader comments

related articles

Google chief executive Eric Schmidt

Google chief joins Apple board

Eric Schmidt brings 'insights and experience' to Jobs & Co 30 Aug 2006

 

Google talks to eBay

Search giant integrating VoIP adverts with Skype 30 Aug 2006

Google right to protect its trademark

Objection to 'google' as a verb is justified, says lawyer 17 Aug 2006

Google takes wraps off Writely

Online word processor free for all 22 Aug 2006

Stop Googling things, says Google

Name not synonymous with just searching, moans web giant 14 Aug 2006

Google launches video ads

'Pull' videos only start when a user clicks the play button 19 Jul 2006

EBay blacklists Google Checkout

Auction giant urges customers to use its own PayPal service 07 Jul 2006

Las Vegas to build farm skyscraper

Casino capital reportedly planning 30 storey farm tower 08 Jan 2008

DNS exploit haunts researcher

Local ISP attack affects BreakingPoint 31 Jul 2008

Google used as password cracker

Hashed passwords fall prey to search engine 23 Nov 2007

related whitepapers

today's top stories

Learning from the credit crunch to avoid a broadband crunch

While it might be the most pressing issue de jour , the financial system isn’t the only area where government needs to... 10 Oct 2008

How careerism can warp IT procurement

Many working in IT put their career interests before those of their employer when weighing up purchasing options 10 Oct 2008

City in pressing need of skilled IT matchmakers

With the financial services sector plunging ever deeper into an M&A maelstrom, IT leaders are having their systems integration skills and due diligence expertise tested as never before 09 Oct 2008

The definitive guide to software development

Five key trends and five best practice tips to help you improve your programming capabilities 09 Oct 2008

Computing podcast - IT implications of the banking crisis, and the FSA clamps down on IT security

We discuss the effect of shotgun mergers and acquisitions on financial services IT staff, and examine the industry regulator's plan to fine directors for information security breaches 09 Oct 2008

Advertisement

Newsletter signup

Sign up for our range of FREE newsletters:

Existing User

Newsletter user login:

Jobs

Related jobs

Job of the week

Job alerts

Sign up here

Find your next job


IT Salary Checker

Check salary here

Advertisement

White papers

Search white papers

Top categories

VPN, Extranet and Intranet Solutions

WAN/ LAN Solutions

Network Security

Interoperability-Connectivity

Grid/ Utility Computing

Latest poll

Would you apply for a job that was advertised on Facebook or a similar social networking site?

Would you apply for a job that was advertised on Facebook or a similar social networking site?

The government is using Facebook to recruit IT staff - would you apply to such an ad?

Previous poll results

Latest audio and video articles

programming codeVideo

The definitive guide to software development

Five key trends and five best practice tips to help you improve your programming capabilities 09 Oct 2008

Podcast imageAudio

Computing podcast - IT implications of the banking crisis, and the FSA clamps down on IT security

We discuss the effect of shotgun mergers and acquisitions on financial services IT staff, and examine the industry regulator's plan to fine directors for information security breaches 09 Oct 2008

Latest in-depth articles

Financial Services Authority buildingAnalysis

FSA threatens executives with fines

Senior management to be held accountable for security lapses at banks 09 Oct 2008

Comment

Broadband must be a spending priority

For the economic health of the nation, the government would do better to bankroll an optical fibre rollout rather than prop up profligate banks 09 Oct 2008

Advertisement

Primary Navigation