Page 1 of 1

OCR - any way of accessing the text overlay as a .txt doc?

Posted: Thu Sep 05, 2013 5:44 pm
by occam
Hi

I have a portable version of PDF-Xchange viewer (latest) running under Win 8. Using the OCR function I am able to make a pdf searchable. Is there a way, however, of accessing the text overlay e.g. as a .txt document or any other format?

Thanks

Re: OCR - any way of accessing the text overlay as a .txt do

Posted: Thu Sep 05, 2013 6:44 pm
by Walter-Tracker Supp
OCR text is essentially the same as visible text, except that it is not rendered. You can extract text by selecting it with the mouse, and copying / pasting, or you can use the Viewer's javascript provisions. I have attached a simple script that extracts text from the current page and outputs it to a text file.

Simply hit "Ctrl-J" within the Viewer to bring up the javascript console, and paste the contents of the attached script (which is a javascript script compressed with 7Zip). Press the run button and it will prompt you for an output filename to save the plain text results to. You can modify the script as you see fit, for example to save to a text file without user intervention.

Our Viewer replicates much of the functionality of the Adobe Javascript API, so you can check their reference manual for information on usage:

http://www.adobe.com/devnet/acrobat/pdf ... erence.pdf

Re: OCR - any way of accessing the text overlay as a .txt do

Posted: Thu Sep 05, 2013 7:11 pm
by occam
Great Thanks Walter! I appreciate the quick feedback.

occam

Re: OCR - any way of accessing the text overlay as a .txt do

Posted: Thu Sep 05, 2013 8:34 pm
by Will - Tracker Supp
:)