Merlin Docustor

Automate with SimpleIndex

What scanning WON'T do

Replacing paper with electronic files saves space and generally makes the working environment a much nicer place. What it won’t do, left to itself, is help you find information when you need it. This requires indexing – the creation of searchable metadata (a techie term for information that’s attached to a document but not a part of it).

A difficult choice

DocuStor offers the simple means of viewing each scanned document in turn and applying this data. It’s a fine balancing act knowing how much manpower to put into this operation:

  • Put a lot of man-hours up front to input every conceivable bit of information likely to be searched for so that anyone looking for a document, not quite knowing what they’re looking for, can search on whatever scraps of information they have with a sporting chance of finding it.


  • Take the view that the frequency of retrieval will be so low as to make more than minimal expended on this function not worthwhile in  terms of return on the investment.

So you have to  choose – low expenditure at front end or low expenditure at back end. Choose unwisely and your document scanning project could end up with additional costs outweighing the benefits of moving away from your paper files.

Why choose? Have both!

Many documents contain standard information that can be identified and automatically extracted for use as searchable indexes. Furthermore, this can often be linked with information from other sources to provide a powerful database enabling a user to find a document on the flimsiest of known information. And with the right tools, this can be achieved with the very minimum of manual intervention, so the circle is squared – you have the means of retrieving any document in seconds without having spent a fortune on providing the means to do so.

SimpleIndex – your flexible friend

SimpleIndex is a scanning program that not only scans documents, but reads the content. It does its work using a number of techniques (move cursor over individual item for more information):

Optical Mark Recognition

Optical Mark Recognition lets you define check box regions on scanned images. OMR is very fast and can be used for a variety of applications:

- Business reply mail
- Simple surveys
- Separate multi-page documents
- Document routing control
- Verify presence of signatures

Zone Optical Character Recognition (OCR)

Zone OCR is used to read document indexes or tags from text on the page. Zone OCR is a great way to automate the data entry associated with scanning documents. However, there are several limitations to TRADITIONAL zone OCR that must be overcome:

- Index information must be in the exact same place on every page
- Documents shift and skew during scanning, causing the zones to not line up
- If surrounding lines or text on the document are too close, they can encroach on the zone

Dynamic OCR

SimpleIndex overcomes zone OCR limitations by using Dynamic OCR technology to locate the desired text even when it moves around on the page. Our simplified version of Dynamic OCR works great for many types of documents at a fraction of the cost of other solutions.

- Index information can appear anywhere on any page
- Unwanted characters are automatically ignored
- Find unique patterns of letters and numbers using Template Matching
- Use Dictionary Matching to find a value from a list of possible values

Full page OCR

Some documents are difficult or impossible to automate with OCR e.g. non-standard layouts, unconstrained handwriting or poor originals. In applications like invoice processing, fully automating the data entry can require expensive software and weeks of consulting and is NEVER 100% effective. SimpleIndex offers a low-cost semi-automated solution - simply:

- Scan a batch of documents with full OCR
- Place the SimpleIndex window side-by-side with your data entry window
- Highlight, copy and paste from OCR where possible and manually enter where not
- Save the image and repeat for the next one

Bar Code reading

Barcode recognition is the most efficient way to capture index data printed on documents. Some documents already have key information in barcode format on them. In many cases adding a barcode to a document is as simple as changing or adding a font. Adding barcodes to new documents is preferable as all the index data is on the document at the time it is created and in a format that can be read with near 100% accuracy.

Open database integration

Open database integration is a powerful feature of SimpleIndex® and one that furthers its interoperability with custom programs. Instead of using a proprietary database, SimpleIndex allows you to map its index fields to cells in any database table. The result is a document database with many different possible search fields, of which only one needed to be entered during scanning.

- SimpleIndex uses a database lookup to retrieve records that match a key value
- This may be entered manually or read automatically using barcode recognition or OCR
- Blank index fields are then filled in automatically with the data from this lookup

What you end up with is an indexing file that can be uploaded directly into DocuStor at the same time as the scanned files. Fast access with minimal manpower.