How do I convert a scanned PDF to Editable text?
I have experienced an issue whereby I cannot edit the text of a scanned PDF within Editor. The text cannot be edited when converting the scanned PDF to Word either. How can I edit the text of a scanned, or image-based PDF?
An image of pages is the result when a file is scanned. As such the resulting PDF does not contain text data as such, but an image of the pages. In order to convert the images to editable text, you must OCR, convert the OCR layer to visible text, then remove the underlying image. There is a bit more of an involved process to convert this kind of PDF to a word document, and you will not be able to retain any graphics or images.
OCR (Optical Character Recognition) scans an image to detect text, then all detected text is overlayed with an invisible text layer. This will allow those who view the file to search for terms within the PDF and make text dependent annotations such as Highlights, or Strikethrough etc. In order to modify the text you will need to OCR the document, then remove the underlying image. First, please go to Convert> OCR pages:
Choose All pages (or your preferred preference) When you are satisfied, click OK:
Page Range Selected allows you to specify the pages to be scanned by the OCR
Recognition Language: some languages use special characters in their alphabets. Select the appropriate language for best results.
Recognition Accuracy: You can choose from 3 scan accuracy levels: Low, Medium and High. When choosing High accuracy, the OCR engine will look at the pages' content more closely. This can cause undesirable results on scans with lots of noise, speckles or other non-textual graphics. With high accuracy, these can be incorrectly identified as characters. In such cases, lowering the accuracy will allow the OCR engine to ignore imperfections.
Output Type can be set to Preserve Original Content and Add Text Layer or Create new searchable PDF. The result is effectivley the same, however, the later allows you to retain the original unmodified document. Create new searchable PDF also provides access to the Quality and Auto Deskew options.
Quality is accessible only when creating a new searchable PDF. The quality is defined in DPI (dots per inch) and relates to the background image resolution.
Auto Deskew is accessible only when creating a new searchable PDF. When a document is scanned, it may not be aligned correctly with the scanner bed resulting in a crooked, or skewed image. This feature will automatically straighten the image.
Click OK when satisfied with the OCR presets to run the OCR engine.
You will see the following dialogue as the document is being processed by the OCR:
When finished, you will be able to make all the necessary annotations to the document:
Editing OCR'd Text
Once the document is OCR'd, you can edit the document. First, you will need to turn the invisible text placed by the OCR into visible text, then remove the underlying picture. You can then Edit the text.
Open the Content pane via View> Panes> Content. Click Options> Select> Text to select all the new text data.
You will see in the Content pane that all the text data has been selected. Next, change the color values via the properties pane (View> Panes> Properties)
Finally, remove the underlying picture through the Contents pane via Options> Select> Images.Then use the Delete key to remove them. Once all that is finished the document will contain only text. You can then go to Convert> To MS-Word:
If you have any further questions or concerns please do not hesitate to contact our support team at firstname.lastname@example.org
More Like This
- KB#173: I cannot view a PDF that is embedded in a Microsoft Word document.
- KB#299: Why are there no PDF file previews in the Outlook 2007 reading pane?
- KB#432: Managing password security options in PDF-XChange Editor
- KB#258: How do I export settings from PDF-XChange Viewer and PDF-XChange Editor?
- KB#489: How do I Add Roman Numerals to Thumbnails