OCR changes English font to illegible characters

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

philjv
User
Posts: 5
Joined: Wed Mar 13, 2024 8:07 pm

OCR changes English font to illegible characters

Post by philjv »

Support request for PDF-Xchange PRO Editor Plus Version: 10.1.2, build 382 (Enhanced OCR) software.

After performing OCR on a PDF document, it:
• changes characters, letters, alphabets, and font
• changes formatting of font
• changes formatting of sentences
• changes the line spacing with some lines disappearing, randomly
• changes font to illegible characters (not in English language)

Happened on multiple documents. Please support.
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6903
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada

Re: OCR changes English font to illegible characters

Post by Paul - Tracker Supp »

Hi, philjv

there are so many variables involved in the OCR process it is hard to say exactly what is happening. The most likely cause is the font on the original may not be available on your system and so a "font substitution" must be done.

May we see a sample PDF before OCR is performed please?

Kind regards,
Paul - Tracker Supp
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
philjv
User
Posts: 5
Joined: Wed Mar 13, 2024 8:07 pm

Re: OCR changes English font to illegible characters

Post by philjv »

As an example, please see attached files before and after the OCR where the font changed after OCR.
TxDOT Spec Item 512 Portable Traffic Barrier ORIGINAL BEFORE OCR.pdf
TxDOT Spec Item 512 Portable Traffic Barrier CHANGED AFTER OCR.pdf
You do not have the required permissions to view the files attached to this post.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8624
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR changes English font to illegible characters

Post by TrackerSupp-Daniel »

Hello, philjv

I cannot seem to locate the illegible characters of which you speak here... with the exception of a few bullet points, that are not converted to more uniform objects, and some table lines that are partially removed, the OCR'ed version looks overall considerably more legible than the original does, below are a few "blink test" gifs for comparison
PDFXEdit_e71wc9F6kc.gif
PDFXEdit_hu7DGWibNE.gif
PDFXEdit_WgsIQG5wqn.gif
Kind regards,
You do not have the required permissions to view the files attached to this post.
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
philjv
User
Posts: 5
Joined: Wed Mar 13, 2024 8:07 pm

Re: OCR changes English font to illegible characters

Post by philjv »

Hello Dan,

Thank you for your response. In the examples that I provided yesterday, those examples were provided to show only the font changes after OCR. And along with that, some table properties also got changed. Those examples were not for any others.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8624
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR changes English font to illegible characters

Post by TrackerSupp-Daniel »

Hello, philjv

I see, in that case, from a font perspective, this is well within an acceptable margin of error. The original document font is "stretched" in height, and in all cases I see from comparison, taking that height stretch into account, this does appear to be the same font. OCR is not able to apply distortions to the text (yet), it simply finds the closest font available, and places characters in that location, while trying keep the same relative position to its neighbors.

Regarding the missing table lines, this is an issue that our Devs are working on, but it is a long term, gradual improvement kind of task.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com
philjv
User
Posts: 5
Joined: Wed Mar 13, 2024 8:07 pm

Re: OCR changes English font to illegible characters

Post by philjv »

Return of Service - Texas Underground Facility Notification Corp (Texas 811) OCR - AUTO-CHANGED BY PDF-X SOFTWARE.pdf
Return of Service - Texas Underground Facility Notification Corp (Texas 811) ORIGINAL BEFORE OCR.pdf
Here are examples of another original signed document before OCR, and the same document after OCR. The OCR in PDF-Xchange PRO Editor Plus Version: 10.1.2, build 382 (Enhanced OCR) software changed the font making the OCR'd document unusable because the changes were not approved by the author of the original signed document. This document is required by the rules of most courts to be OCR'd before filing into a court's electronic filing system, but a document with unauthorized changes made in any manner after its signature cannot be filed with a court.

This is a standard usage expected of any OCR functionality whether it is with PDF-X or others. Especially, it is definitely expected in a software with "Enhanced OCR."

Please support on how to maintain the original font and properties after the OCR without making any unauthorized changes to the document.
You do not have the required permissions to view the files attached to this post.
User avatar
TrackerSupp-Daniel
Site Admin
Posts: 8624
Joined: Wed Jan 03, 2018 6:52 pm

Re: OCR changes English font to illegible characters

Post by TrackerSupp-Daniel »

Hello, philjv

If you are performing OCR on a document for the purpose of submitting it to the courts, you should never be using the "editable" option, as this can and will make changes to the document content, invalidating any signatures present.

You will need to use the "searchable text" OCR option instead, which leaves the original page intact, and adds invisible text content overlayed on the respective area of the page. Do note that, as I have already mentioned in this thread, OCR is not a perfect system, mistakes can be made, and this document has a number of blemishes, as well as handwritten text, which can confuse OCR systems further. All of this means that even for searchable purposes, there may still be mistakes.

Kind regards,
Dan McIntyre - Support Technician
Tracker Software Products (Canada) LTD

+++++++++++++++++++++++++++++++++++
Our Web site domain and email address has changed as of 26/10/2023.
https://www.pdf-xchange.com
Support@pdf-xchange.com