Word (docx)

Contents

Word (docx) output format coverts the current document into a Microsoft Word document format, specifically DOCX version.

ParameterValue
Output ExtensionDOCX
OCRAlways (OmniPage & Abbyy)
Multipage supportYes
output_profiles_docx1

Engine
Select here the OCR engine to use to run the current task and create the output document. Available engines are, based on the current license:

  • Nuance OmniPage
  • Abbyy FineReader

Select the language to use during the OCR recognition process. Multiple languages can be selected by holding CTRL key while selecting the languages.

Please refer to the OCR Appendix chapter for the supported OCR languages.

Timeout
Specify a maximum amount of time the OCR process should run, after which a timeout will occur terminating the process with an error. To be used to prevent the OCR process might take too long, hang or loop on particular complex or malformed documents.

The timeout value is expressed in seconds.

OmniPage

When selecting the Nuance OmniPage engine additional format options will be displayed.

Use frames
If enabled Microsoft Word frames will be used to group paragraph and compose the document layout. The option allows to have a more precise layout by using the more accurate position given by the frames, however text editable will be less easy inside the frames.

Abbyy

When selecting the Abbyy FineReader engine additional format options will be displayed.

Keep pictures
If enabled original pictures will be retained during the export of the recognized text to Word.

Keep text color
If enabled original colors of text are retained during export of the recognized text to Word.

Enhance local contrast
If enabled engine will increase the local contrast of the image during the preprocessing of the image. Such option may increase the quality of recognition.

Info

The option is meaningful for color and gray images only.

The images for which this preprocessing method is effective include:

  • Photos or scans of documents with texture or pictures in the background. With the normal binarization procedure, the characters that coincide with darker areas of background may be lost or recognized unreliably. If you apply this method before recognition, such areas are detected, and contrast is increased, with the result that after binarization the characters stand out more distinctly.
  • Photos or scans of documents with highly colorful background or text highlighting.

Remove noise
If enabled engine will reduce the noise of the image. Available working options are:

  • White noise: this mode may be useful, for example, for uncompressed images with ISO less then 800, for reduced images.
  • Correlated noise: this mode may be useful, for example, for the JPEG photos with high compression settings
Info

The option can be used only for color and 8-bit gray images.

Title, Author, Subject, Keywords
Enter a text to set as Title, Author, Subject or Keywords property of the Microsoft Office document, or click the Variable button to select a variable which will contain a value for the target properties.

Previous Article

Word (doc)

Next Article

Word (rtf)