Getting your printed data into electronic form

21 Apr 2009 View Comments
A Computing logo

When it comes to DMS (document management systems), one thing is for sure – it is a lot cheaper to get going with scanning documents in, using a scanning agency, than to try and invest in the requirements of a dedicated document management solution. But does cheaper mean better? Once you start building up scanned documents, and this will happen quite quickly, you may look back and think - if only.

You really need to put some thought into how you manage your scanning agencies

What is a document management system? In its simplest form it is a software application which should provide mechanisms for the capture, storage, indexing, locating, retrieval and auditing of documents. To enable auditing a good document management system should have administrative management facilities built-in offering user level accounts – preferably integrated with your existing directory.


 Over the last few years I have seen a number of different document management systems and from what I have seen document management is still something of an unknown quantity with many suppliers seemingly unable to nail down exactly where it fits or how it should behave. Consultancy may often be sold with these systems and with good reason – the more questions you ask the more questions you find.

You may find questions are answered with the line: “That is being addressed in a future release”.

Many systems are also built from a number of optional components - I am really starting to dislike optional components. Demonstration systems are usually filled with all available options and often you only fully realise the optional nature when you find them missing in the system you just bought. How many times have you been to a demonstration of a software system, thought it was great, but found the one which just arrived is not as fully functional?

The only thing worse than optional components is a demo version that is “not yet publicly available”.

When you do not have tens of thousands of pounds up front for a system implementation but need to address a problem – what do you do? If you think the solution will never cost that much money remember you also need to factor in new hardware which also means new rack space, new networking ports, power protection, backup equipment, operating system licences, database software (and licensing), and user buy-in and this is all before committing any spend to the actual product. After purchase you will also need ongoing maintenance and support, consultancy, and user training – all of this can often add up to far more than the expected initial cost of the proposed system.

We identified a requirement for document scanning but did not have the resource to do it ourselves.

User demand, storage space and speed of locating documents are driving issues which moved us into getting our paper archives scanned in. You could say we were just testing the water as we only have 50 or so CD-Rom discs filled with more than 300,000 files filling in excess of 15GB of data in Adobe Acrobat PDF form. Our paper-based archive room is still full to the brim and getting fuller every day so I have no idea where this will all end – or what challenges are likely to be thrown in our way. But the ones we have faced so far have been interesting enough.




Our first foray into document scanning was a mini adventure. The first CD arrived full of images mixed up with TIF and PDF files; we soon realised PDF was a reasonable, and accessible, format so have stuck with that ever since. Another problem we faced with these CD-Rom arrivals were inconsistencies with the folder and naming structure – especially as we employ a number of different agencies who all seem to have their own ideas of how things should be done. Most of the time, CD-Rom discs arrive with all the images dumped in a root directory. From time to time others have arrived with an accompanying spreadsheet showing how sub-folders have been created or how file names may be cross-indexed. It is only once you get started looking for information you can fully realise the impact of the naming structures – so you should spend time with this and set it in stone – do not rely on agency innovation here.

Innovation may be great for the scanning agency but it does not help us locate our files consistently.

Internally, we have other challenges. It seems a given, by the user base, that we have a never-ending amount of drive space (and search facilities) to find any document a user may be looking for at any given time (we are talking 24x7 here). The user base, quite rightly, may not think twice about conducting a system-wide search for a particular file name – but it only takes a handful of these searches for the server to start to feel the pinch.

At this point I should mention SharePoint which is still sitting in the background, a shame as it has yet to make its way out of the shade into the light of the production environment. MOSS 2007 (Microsoft Office SharePoint Server) is a capable platform, once you get your head around some of its complexities, but it requires a shift in thinking from the user base and that has yet to happen. I often look to the future where a document management system may take full benefit of the SharePoint platform, and better yet integrate into Outlook. For this to really take off SharePoint needs to step forward into centre stage – only then will its power be realised.

If you are thinking of getting your documents scanned in, here are some pointers based on our lessons learned:

  • Ensure you have sufficient storage space and search facilities for the information
  • Always obtain two copies of the electronic media from your agency
  • Always check both media when they arrive to ensure they are not blank
  • Keep one copy at an offsite location for safe keeping
  • Label the incoming CD-Rom and copy the data into a matching “master” (read-only) location on the server
  • Ensure you employ a good identification system so you can tie up data with which CD it originated
  • Identify a standard way for your agency to format scanned files, e.g. PDF/TIF
  • Really think about your file-naming structure – this is what you will rely on to find information – think of your filename as your metadata. Make this clear to your scanning agency
  • Plan your file organisation structure on your storage server so information is logically where it needs to be – this will cut down on the number of raw searches required
  • Do a page count of the incoming CD-Rom and check this against the invoice you receive from the agency

Reader comments
blog comments powered by Disqus
Windows 10 - will you upgrade?

Microsoft has made an early version of Windows 10 - its next operating system - available for download. The OS promises better integration and harmonisation across platforms, including mobile and desktop. Will your business be upgrading?

35 %
31 %
14 %
20 %