Alto (XML)

Contents

Alto (XML) output format is a open XML schema for the description of the OCR text, hence it is not creating a conversion of the processing document while producing a data output.

ParameterValue
Output ExtensionXML
OCRAlways (OmniPage & Abbyy)
Multipage supportNo
output_profiles_alto1

Engine
Select here the OCR engine to use to run the current task and create the output document. Available engines are, based on the current license:

  • Nuance OmniPage
  • Abbyy FineReader

Select the language to use during the OCR recognition process. Multiple languages can be selected by holding CTRL key while selecting the languages.

Please refer to the OCR Appendix chapter for the supported OCR languages.

Timeout
Specify a maximum amount of time the OCR process should run, after which a timeout will occur terminating the process with an error. To be used to prevent the OCR process might take too long, hang or loop on particular complex or malformed documents.

The timeout value is expressed in seconds.

Abbyy

When selecting the Abbyy FineReader engine additional format options will be displayed.

Enhance local contrast
If enabled engine will increase the local contrast of the image during the preprocessing of the image. Such option may increase the quality of recognition.

Info

The option is meaningful for color and gray images only.

The images for which this preprocessing method is effective include:

  • Photos or scans of documents with texture or pictures in the background. With the normal binarization procedure, the characters that coincide with darker areas of background may be lost or recognized unreliably. If you apply this method before recognition, such areas are detected, and contrast is increased, with the result that after binarization the characters stand out more distinctly.
  • Photos or scans of documents with highly colorful background or text highlighting.

Remove noise
If enabled engine will reduce the noise of the image. Available working options are:

  • White noise: this mode may be useful, for example, for uncompressed images with ISO less then 800, for reduced images.
  • Correlated noise: this mode may be useful, for example, for the JPEG photos with high compression settings
Info

The option is meaningful for color and gray images only.

Next Article

BMP