OCR doesn't do anything

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
vsrawat
User
Posts: 7
Joined: Thu Sep 15, 2016 3:17 pm

OCR doesn't do anything

Post by vsrawat »

I today downloaded and installed and ran the Editor.

I gave it a 78 pages pdf file (only English) and proceeded to OCR all pages.

It took hours and ended without doing anything or creating and ocr-output file.

I again did this, this time for a single page, and it did nothing.

It is not even asking me where to save the ocr-ed file. I don't know where it is saving that if it at all created one.

The input pdf file had only English text (not images), it should have just read the ascii letters,
I don't know why it went ahead to actually ocr those pages.

Thanks.
Last edited by vsrawat on Thu Sep 15, 2016 3:39 pm, edited 1 time in total.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17822
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR doesn't do anything, just wasted time

Post by Tracker Supp-Stefan »

Hello vsrawat,

Welcome to our forums.
The default action when you OCR a PDF file with our tool would be to add an invisible layer of text over the existing image, in the existing file. So that is why no new file is created, and why it seems as if nothing happened. Please try using the "text select" tool now - and you should be able to select your text. You can also use the search tools, and they should now find text in your document.

Regards,
Stefan
vsrawat
User
Posts: 7
Joined: Thu Sep 15, 2016 3:17 pm

Re: OCR doesn't do anything, just wasted time

Post by vsrawat »

The input pdf is already a fully extract-able text file.
I could select entire text without doing ocr, so I couldn't know whether any difference has come.

I did ocr because some text like " ' etc., were coming as junk in normally picked text, so I thought ocr would be able to recognise them correctly.

I would say this method is very complicated, and the software doesn't gives any message anywhere
about this invisible layer creation, and how to proceed with that.

It would have been much simpler to do and easier to understand and handle, if it had just created a txt or docx file on the disk having ocr-ed text.

Thanks.
vsrawat
User
Posts: 7
Joined: Thu Sep 15, 2016 3:17 pm

Re: OCR doesn't do anything

Post by vsrawat »

Also, it should at least do cleaning up of text,

like

- merging different lines of a single paragragh to a single line, by removing extra cr-lf that comes in pdf.
- putting header and footer only on first page, or wherever it has changed, and removing it from all other pages.

I think adding the ocr-ed text in a new layer is cryptic and users would not like all that, and rather want it like I am needing.

Thanks.
vsrawat
User
Posts: 7
Joined: Thu Sep 15, 2016 3:17 pm

Re: OCR doesn't do anything

Post by vsrawat »

I opened a image pdf file in Editor and then in Viewer,
but "ocr pages" menu option is coming as dimmed (not active) in both.

So, it doesn't ocr images it seems, it only ocrs when the file is already text. What is the purpose then?

The said pdf file having image is attached.

Thanks.
Attachments
AMPR86958463_2016-09-11_11-41-26.pdf
(34 KiB) Downloaded 199 times
Willy Van Nuffel
User
Posts: 2347
Joined: Wed Jan 18, 2006 12:10 pm

Re: OCR doesn't do anything

Post by Willy Van Nuffel »

Hello,

OCR (Optical Character Recognition) is a feature that can convert a scanned page (photo or image of a page) into a page with real text layer, so that you select text and it also becomes 'searchable'. If you open a PDF with scanned pages that are not yet OCR'ed, you can not select any text in it. When you zoom in onto the pages, you will probably see that they have been scanned. The text that you see will be of low(er) quality.

The example PDF that you sent, has been 'secured' against modifications, copying, ... by a password.
You can verify this in PDF-XChange Editor, when the PDF is open, via File > Document Properties > Security.
So, by consequence, it is not even possible to use OCR in it.
On the other hand, OCR that PDF has no sense, because it goes about real text. The origin of the content is not coming from a scanner.

An other thing that may be of interest to you, is that starting from the actual version 6.0 - Build 318.0 of PDF-XChange Editor, you can convert a PDF to a Word document (via File > Save As), on the condition that the PDF is not secured.

Best regards.
User avatar
Patrick-Tracker Supp
Site Admin
Posts: 1645
Joined: Thu Mar 27, 2014 6:14 pm
Location: Vancouver Island
Contact:

Re: OCR doesn't do anything

Post by Patrick-Tracker Supp »

:)
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Cheers,

Patrick Charest
Tracker Support North America
vsrawat
User
Posts: 7
Joined: Thu Sep 15, 2016 3:17 pm

Re: OCR doesn't do anything

Post by vsrawat »

I had joined here yesterday and posted in this sub forum.

Now I see there is a specific sub forum for OCR plugin.

Admin is requested to please move this thread to that appropriate sub forum.

Thanks.
User avatar
Will - Tracker Supp
Site Admin
Posts: 6815
Joined: Mon Oct 15, 2012 9:21 pm
Location: London, UK
Contact:

Re: OCR doesn't do anything

Post by Will - Tracker Supp »

Hi vsrawat,

The post has now been moved.

Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.

Best regards

Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Post Reply