OCR doesn't do anything
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
OCR doesn't do anything
I today downloaded and installed and ran the Editor.
I gave it a 78 pages pdf file (only English) and proceeded to OCR all pages.
It took hours and ended without doing anything or creating and ocr-output file.
I again did this, this time for a single page, and it did nothing.
It is not even asking me where to save the ocr-ed file. I don't know where it is saving that if it at all created one.
The input pdf file had only English text (not images), it should have just read the ascii letters,
I don't know why it went ahead to actually ocr those pages.
Thanks.
I gave it a 78 pages pdf file (only English) and proceeded to OCR all pages.
It took hours and ended without doing anything or creating and ocr-output file.
I again did this, this time for a single page, and it did nothing.
It is not even asking me where to save the ocr-ed file. I don't know where it is saving that if it at all created one.
The input pdf file had only English text (not images), it should have just read the ascii letters,
I don't know why it went ahead to actually ocr those pages.
Thanks.
Last edited by vsrawat on Thu Sep 15, 2016 3:39 pm, edited 1 time in total.
- Tracker Supp-Stefan
- Site Admin
- Posts: 17929
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR doesn't do anything, just wasted time
Hello vsrawat,
Welcome to our forums.
The default action when you OCR a PDF file with our tool would be to add an invisible layer of text over the existing image, in the existing file. So that is why no new file is created, and why it seems as if nothing happened. Please try using the "text select" tool now - and you should be able to select your text. You can also use the search tools, and they should now find text in your document.
Regards,
Stefan
Welcome to our forums.
The default action when you OCR a PDF file with our tool would be to add an invisible layer of text over the existing image, in the existing file. So that is why no new file is created, and why it seems as if nothing happened. Please try using the "text select" tool now - and you should be able to select your text. You can also use the search tools, and they should now find text in your document.
Regards,
Stefan
Re: OCR doesn't do anything, just wasted time
The input pdf is already a fully extract-able text file.
I could select entire text without doing ocr, so I couldn't know whether any difference has come.
I did ocr because some text like " ' etc., were coming as junk in normally picked text, so I thought ocr would be able to recognise them correctly.
I would say this method is very complicated, and the software doesn't gives any message anywhere
about this invisible layer creation, and how to proceed with that.
It would have been much simpler to do and easier to understand and handle, if it had just created a txt or docx file on the disk having ocr-ed text.
Thanks.
I could select entire text without doing ocr, so I couldn't know whether any difference has come.
I did ocr because some text like " ' etc., were coming as junk in normally picked text, so I thought ocr would be able to recognise them correctly.
I would say this method is very complicated, and the software doesn't gives any message anywhere
about this invisible layer creation, and how to proceed with that.
It would have been much simpler to do and easier to understand and handle, if it had just created a txt or docx file on the disk having ocr-ed text.
Thanks.
Re: OCR doesn't do anything
Also, it should at least do cleaning up of text,
like
- merging different lines of a single paragragh to a single line, by removing extra cr-lf that comes in pdf.
- putting header and footer only on first page, or wherever it has changed, and removing it from all other pages.
I think adding the ocr-ed text in a new layer is cryptic and users would not like all that, and rather want it like I am needing.
Thanks.
like
- merging different lines of a single paragragh to a single line, by removing extra cr-lf that comes in pdf.
- putting header and footer only on first page, or wherever it has changed, and removing it from all other pages.
I think adding the ocr-ed text in a new layer is cryptic and users would not like all that, and rather want it like I am needing.
Thanks.
Re: OCR doesn't do anything
I opened a image pdf file in Editor and then in Viewer,
but "ocr pages" menu option is coming as dimmed (not active) in both.
So, it doesn't ocr images it seems, it only ocrs when the file is already text. What is the purpose then?
The said pdf file having image is attached.
Thanks.
but "ocr pages" menu option is coming as dimmed (not active) in both.
So, it doesn't ocr images it seems, it only ocrs when the file is already text. What is the purpose then?
The said pdf file having image is attached.
Thanks.
- Attachments
-
- AMPR86958463_2016-09-11_11-41-26.pdf
- (34 KiB) Downloaded 225 times
-
- User
- Posts: 2394
- Joined: Wed Jan 18, 2006 12:10 pm
Re: OCR doesn't do anything
Hello,
OCR (Optical Character Recognition) is a feature that can convert a scanned page (photo or image of a page) into a page with real text layer, so that you select text and it also becomes 'searchable'. If you open a PDF with scanned pages that are not yet OCR'ed, you can not select any text in it. When you zoom in onto the pages, you will probably see that they have been scanned. The text that you see will be of low(er) quality.
The example PDF that you sent, has been 'secured' against modifications, copying, ... by a password.
You can verify this in PDF-XChange Editor, when the PDF is open, via File > Document Properties > Security.
So, by consequence, it is not even possible to use OCR in it.
On the other hand, OCR that PDF has no sense, because it goes about real text. The origin of the content is not coming from a scanner.
An other thing that may be of interest to you, is that starting from the actual version 6.0 - Build 318.0 of PDF-XChange Editor, you can convert a PDF to a Word document (via File > Save As), on the condition that the PDF is not secured.
Best regards.
OCR (Optical Character Recognition) is a feature that can convert a scanned page (photo or image of a page) into a page with real text layer, so that you select text and it also becomes 'searchable'. If you open a PDF with scanned pages that are not yet OCR'ed, you can not select any text in it. When you zoom in onto the pages, you will probably see that they have been scanned. The text that you see will be of low(er) quality.
The example PDF that you sent, has been 'secured' against modifications, copying, ... by a password.
You can verify this in PDF-XChange Editor, when the PDF is open, via File > Document Properties > Security.
So, by consequence, it is not even possible to use OCR in it.
On the other hand, OCR that PDF has no sense, because it goes about real text. The origin of the content is not coming from a scanner.
An other thing that may be of interest to you, is that starting from the actual version 6.0 - Build 318.0 of PDF-XChange Editor, you can convert a PDF to a Word document (via File > Save As), on the condition that the PDF is not secured.
Best regards.
- Patrick-Tracker Supp
- Site Admin
- Posts: 1645
- Joined: Thu Mar 27, 2014 6:14 pm
- Location: Vancouver Island
- Contact:
Re: OCR doesn't do anything
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Cheers,
Patrick Charest
Tracker Support North America
Thank you.
Cheers,
Patrick Charest
Tracker Support North America
Re: OCR doesn't do anything
I had joined here yesterday and posted in this sub forum.
Now I see there is a specific sub forum for OCR plugin.
Admin is requested to please move this thread to that appropriate sub forum.
Thanks.
Now I see there is a specific sub forum for OCR plugin.
Admin is requested to please move this thread to that appropriate sub forum.
Thanks.
- Will - Tracker Supp
- Site Admin
- Posts: 6815
- Joined: Mon Oct 15, 2012 9:21 pm
- Location: London, UK
- Contact:
Re: OCR doesn't do anything
Hi vsrawat,
The post has now been moved.
Cheers,
The post has now been moved.
Cheers,
If posting files to this forum, you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded.
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com
Thank you.
Best regards
Will Travaglini
Tracker Support (Europe)
Tracker Software Products Ltd.
http://www.tracker-software.com