RFE: OCR feature with more settings

This Forum is for the use of End Users requiring help and assistance for Tracker Software's PDF-Tools.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
Ginfer2
User
Posts: 8
Joined: Wed Dec 28, 2016 12:01 am

RFE: OCR feature with more settings

Post by Ginfer2 »

I have numerous PDF files with garbage text when selected/copied and I need to batch convert them into a readable format. PDF-Tools would help if the original garbage text layer wouldn't remain in the document so I would need a setting where a new PDF without existing text layers and just the new OCRed ones would be created. If I remember correctly PDF Editor has this or at least a similar feature so I hope this shouldn't be too hard to implement.

BTW, thanks for this great product.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: RFE: OCR feature with more settings

Post by Tracker Supp-Stefan »

Hello Ginfer2,

While the OCR tool inside the PDF Tools does not have the same "Create new searchable PDF file" option as the Editor, please try to create a custom tool. It should include e.g. open the original files, then convert them to images (this will wipe the garbled OCR layer already in the files), then take those images, and create new files from them adding an OCR step in the creation.
Alternatively - you can extract the original images from the existing PDF files, and then create new PDFs out of those, with an OCR step.

Regards,
Stefan
Ginfer2
User
Posts: 8
Joined: Wed Dec 28, 2016 12:01 am

Re: RFE: OCR feature with more settings

Post by Ginfer2 »

Thank you for your input. Strangely, it's no garbled OCR layer, I noticed that those characters can be 1:1 to existing ones (at least when it comes to ASCII).

/edit: Font is F16, apparently some strange Type 3 encoded stuff.

/edit2: I got this deja vu vibe and then I remembered that I already posted a feature request for a similar issue (using my previous account): https://www.pdf-xchange.com/forum3 ... 920#p80920, I was told to wait for the Editor SDK back then.
Attachments
xch_strange_chars_2.zip
Second zip with screenshot.
(74.35 KiB) Downloaded 112 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: RFE: OCR feature with more settings

Post by Tracker Supp-Stefan »

Hello Ginfer2,

Yes it will be possible with the SDK products, but I presume you need to achieve that with the end user ones?
Can we have a sample file and not just the screenshot so that we can run some tests locally?

Regards,
Stefan
Ginfer2
User
Posts: 8
Joined: Wed Dec 28, 2016 12:01 am

Re: RFE: OCR feature with more settings

Post by Ginfer2 »

Let's continue talking about what I thought I need this for in Bug: Copying text from certain PDFs with Type 3 fonts broken in the XChange Editor support forum, apparently what I'm seeing is a XChange Editor bug because it does not appear in the Adobe Reader (shame on me for not testing it with other PDF readers earlier).

This original RFE still stand on its own, while I don't need it in this case I would argue that it would still be a nice feature to have for other tasks.

Oh, and I just noticed that I'm in the support and not the new feature request forum, sorry for that. Could you please move this thread to where it belongs? Also, sorry for being so all over the place in this thread.
Post Reply