OCR - any way of accessing the text overlay as a .txt doc?

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
occam
User
Posts: 5
Joined: Sun Aug 22, 2010 7:35 am

OCR - any way of accessing the text overlay as a .txt doc?

Post by occam »

Hi

I have a portable version of PDF-Xchange viewer (latest) running under Win 8. Using the OCR function I am able to make a pdf searchable. Is there a way, however, of accessing the text overlay e.g. as a .txt document or any other format?

Thanks
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: OCR - any way of accessing the text overlay as a .txt do

Post by Walter-Tracker Supp »

OCR text is essentially the same as visible text, except that it is not rendered. You can extract text by selecting it with the mouse, and copying / pasting, or you can use the Viewer's javascript provisions. I have attached a simple script that extracts text from the current page and outputs it to a text file.

Simply hit "Ctrl-J" within the Viewer to bring up the javascript console, and paste the contents of the attached script (which is a javascript script compressed with 7Zip). Press the run button and it will prompt you for an output filename to save the plain text results to. You can modify the script as you see fit, for example to save to a text file without user intervention.

Our Viewer replicates much of the functionality of the Adobe Javascript API, so you can check their reference manual for information on usage:

http://www.adobe.com/devnet/acrobat/pdf ... erence.pdf
Attachments
extract_text.7z
(519 Bytes) Downloaded 250 times
occam
User
Posts: 5
Joined: Sun Aug 22, 2010 7:35 am

Re: OCR - any way of accessing the text overlay as a .txt do

Post by occam »

Great Thanks Walter! I appreciate the quick feedback.

occam
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR - any way of accessing the text overlay as a .txt do

Post by Will - Tracker Supp »

:)
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply