Support request for PDF-Xchange PRO Editor Plus Version: 10.1.2, build 382 (Enhanced OCR) software.
After performing OCR on a PDF document, it:
• changes characters, letters, alphabets, and font
• changes formatting of font
• changes formatting of sentences
• changes the line spacing with some lines disappearing, randomly
• changes font to illegible characters (not in English language)
Happened on multiple documents. Please support.
OCR changes English font to illegible characters
Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan
-
- User
- Posts: 5
- Joined: Wed Mar 13, 2024 8:07 pm
-
- Site Admin
- Posts: 6903
- Joined: Wed Mar 25, 2009 10:37 pm
- Location: Chemainus, Canada
Re: OCR changes English font to illegible characters
Hi, philjv
there are so many variables involved in the OCR process it is hard to say exactly what is happening. The most likely cause is the font on the original may not be available on your system and so a "font substitution" must be done.
May we see a sample PDF before OCR is performed please?
Kind regards,
Paul - Tracker Supp
there are so many variables involved in the OCR process it is hard to say exactly what is happening. The most likely cause is the font on the original may not be available on your system and so a "font substitution" must be done.
May we see a sample PDF before OCR is performed please?
Kind regards,
Paul - Tracker Supp
Best regards
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
-
- User
- Posts: 5
- Joined: Wed Mar 13, 2024 8:07 pm
Re: OCR changes English font to illegible characters
As an example, please see attached files before and after the OCR where the font changed after OCR.
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 8624
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR changes English font to illegible characters
Hello, philjv
I cannot seem to locate the illegible characters of which you speak here... with the exception of a few bullet points, that are not converted to more uniform objects, and some table lines that are partially removed, the OCR'ed version looks overall considerably more legible than the original does, below are a few "blink test" gifs for comparison Kind regards,
I cannot seem to locate the illegible characters of which you speak here... with the exception of a few bullet points, that are not converted to more uniform objects, and some table lines that are partially removed, the OCR'ed version looks overall considerably more legible than the original does, below are a few "blink test" gifs for comparison Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 5
- Joined: Wed Mar 13, 2024 8:07 pm
Re: OCR changes English font to illegible characters
Hello Dan,
Thank you for your response. In the examples that I provided yesterday, those examples were provided to show only the font changes after OCR. And along with that, some table properties also got changed. Those examples were not for any others.
Thank you for your response. In the examples that I provided yesterday, those examples were provided to show only the font changes after OCR. And along with that, some table properties also got changed. Those examples were not for any others.
-
- Site Admin
- Posts: 8624
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR changes English font to illegible characters
Hello, philjv
I see, in that case, from a font perspective, this is well within an acceptable margin of error. The original document font is "stretched" in height, and in all cases I see from comparison, taking that height stretch into account, this does appear to be the same font. OCR is not able to apply distortions to the text (yet), it simply finds the closest font available, and places characters in that location, while trying keep the same relative position to its neighbors.
Regarding the missing table lines, this is an issue that our Devs are working on, but it is a long term, gradual improvement kind of task.
Kind regards,
I see, in that case, from a font perspective, this is well within an acceptable margin of error. The original document font is "stretched" in height, and in all cases I see from comparison, taking that height stretch into account, this does appear to be the same font. OCR is not able to apply distortions to the text (yet), it simply finds the closest font available, and places characters in that location, while trying keep the same relative position to its neighbors.
Regarding the missing table lines, this is an issue that our Devs are working on, but it is a long term, gradual improvement kind of task.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
-
- User
- Posts: 5
- Joined: Wed Mar 13, 2024 8:07 pm
Re: OCR changes English font to illegible characters
This is a standard usage expected of any OCR functionality whether it is with PDF-X or others. Especially, it is definitely expected in a software with "Enhanced OCR."
Please support on how to maintain the original font and properties after the OCR without making any unauthorized changes to the document.
You do not have the required permissions to view the files attached to this post.
-
- Site Admin
- Posts: 8624
- Joined: Wed Jan 03, 2018 6:52 pm
Re: OCR changes English font to illegible characters
Hello, philjv
If you are performing OCR on a document for the purpose of submitting it to the courts, you should never be using the "editable" option, as this can and will make changes to the document content, invalidating any signatures present.
You will need to use the "searchable text" OCR option instead, which leaves the original page intact, and adds invisible text content overlayed on the respective area of the page. Do note that, as I have already mentioned in this thread, OCR is not a perfect system, mistakes can be made, and this document has a number of blemishes, as well as handwritten text, which can confuse OCR systems further. All of this means that even for searchable purposes, there may still be mistakes.
Kind regards,
If you are performing OCR on a document for the purpose of submitting it to the courts, you should never be using the "editable" option, as this can and will make changes to the document content, invalidating any signatures present.
You will need to use the "searchable text" OCR option instead, which leaves the original page intact, and adds invisible text content overlayed on the respective area of the page. Do note that, as I have already mentioned in this thread, OCR is not a perfect system, mistakes can be made, and this document has a number of blemishes, as well as handwritten text, which can confuse OCR systems further. All of this means that even for searchable purposes, there may still be mistakes.
Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
Tracker Software Products (Canada) LTD
+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com