Knowledgebase

Back to Articles List

How do I OCR documents in PDF-XChange Editor and PDF-XChange Viewer?

Question:

How do I perform OCR on documents? 

How do I convert image-based documents into text-searchable documents?

Answer:

Please note that OCR (optical character recognition) scans image-based documents, recognizes text and then inserts an invisible text-layer over the text. The text layer contains identical text to that recognized in the document. This means that the original, image-based text in documents can effectively be searched and selected via the invisible text layer, which is the main benefit of OCR. However, it should be noted that the document text cannot be edited in the same manner as normal, text-based documents - as it remains an image-based document, despite the invisible text layer. Follow the steps below to perform OCR:

PDF-XChange Editor

1. Click Convert in the Ribbon Toolbar, then click OCR Page(s) in the submenu. The OCR Pages dialog box will open:

The Page Range options are as follows:

  • Select All to OCR all the pages of the document.
  • Select Current Page to OCR only the current page.
  • Use the Pages box to determine specific pages of the document on which to perform the OCR process. Page range settings are detailed here.
  • Use the Subset option to select All Pages, Odd Pages Only or Even Pages Only.
  • The Recognition options determine the language and accuracy of the OCR process. If the desired language is not available in the dropdown menu, then click More Languages for further options. Increasing the accuracy increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document on which the operation is carried out features imperfections. This is because the software will search to a greater depth and may attempt to recognise imperfections as text.
  • The Output options determine the format of the output information from the OCR process. Select either Create New Searchable PDF or Preserve Original Content and Add Text Layer as desired.
    • Create New Searchable PDF will duplicate the current file and create a new PDF in which it perform the OCR process. A good option if you wish to leave the current file unaffected by the process, but would like to test results.
    • Preserve original content and Add Text Layer will do exactly as it implies. Preserve the orignal content, and place the OCR'd text layer above it. This method will not create a new document, and will alter the current document by adding searchable text.
  • The Quality setting determines the resolution of the new PDF document in dpi (dots per inch).
  • Select the Auto Deskew option to deskew documents automatically. (Deskewing is a useful feature that straightens images that have been photographed or scanned crookedly).

2. Click OK to OCR documents.

Please note that it is also possible to OCR documents when scanned content or images are used to create PDF documents:

Images

1. Click File in the Ribbon Toolbar, then click New Document and click From Image File(s):

The Images to PDF dialog box will open:

2. Add files and determine settings as detailed here.

3. Click Options for further options. The Image to PDF Options dialog box will open. Click Image Post-Processing to view OCR options when images are converted to PDF:

4. Select the Run OCR box to OCR images when they are converted to PDF. Click OCR Settings to determine language and accuracy options, as detailed above.

Scanned Content

1. Click File, then click New Document. 

2. Click From Scanner, then click Custom Scan:

3. The Scan Properties dialog box will open:

 

4. Determine settings as detailed here.

5. Click Images Insertion Options to determine options for inserted images. The Image to PDF Options dialog box will open. Click Image Post-Processing to view OCR options when scanned content is converted to PDF:

 6. Select the Run OCR box to OCR images when they are converted to PDF. Click OCR Settings to determine language and accuracy options, as detailed above.

PDF-XChange Viewer

1. Click Document in the Menu Toolbar, then click OCR Pages in the submenu (or press Ctrl+Shift+C). The OCR Pages dialog box will open:

  • The Pages Range options are as follows:
  • Select All to OCR all the pages of the document.
  • Select Selected Pages to OCR only the pages currently selected in the document.
  • Select Current Page to OCR only the current page.
  • Select Pages to determine specific pages of the document on which to perform the OCR process. Enter the desired page range(s) in the text box.
  • The Recognition options determine the language and accuracy of the OCR process. If the desired language is not available in the dropdown menu, then click More Languages for further options. Increasing the accuracy increases the time that the process takes and vice versa. Additionally, it should be noted that setting the accuracy to high may result in unusual output if the document on which the operation is carried out features imperfections. This is because the software will search to a greater depth and may attempt to recognise imperfections as text.
  • The Output options determine the format of the output information from the OCR process:
  • Select Preserve Original Content & Add Text Layer to have PDF-XChange Viewer analyze the document, recognize text and then insert an invisible text-layer over the text. N.b. The text layer contains identical text to that recognized in the document. This means that the original, image-based text in documents can effectively be searched and selected via the invisible text layer, which is the main benefit of OCR. However, it should be noted that the document text cannot be edited in the same manner as normal, text-based documents - as it remains an image-based document, despite the invisible text layer.
  • Select Convert Page Content to Image only - Add Text As a Layer to convert documents that contain both images and text into a single, consolidated image. If this option is selected then use the Images Quality dropdown menu to determine the resolution in dpi (dots per inch) of the created image. N.b. If this mode is used for image-only documents, then the only change will be the resolution of the image (when the initial dpi is different from the dpi specified in the Images Quality dropdown menu - otherwise no changes will occur). Please note that output documents from this process will replace input documents. If input documents in their original format will be needed subsequently then a copy should be made before this process is performed.

2. Click OK to OCR documents.

Was this article helpful?
Yes No Somewhat