(Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata  SOLVED

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
zarkogajic
User
Posts: 1372
Joined: Thu Sep 05, 2019 12:35 pm

(Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by zarkogajic »

Hi Support,

Seems like in some of the few past releases, the extension of the language files for the Default OCR engine (the one available in SDK) has changed from ".dat" to ".traineddata".

Can you confirm that's the only change (related to using the default OCR from SDK)?

Are the same files used in both x86 and x64 (previously, with .dat, this was the case) ?


p.s.
For new readers, this is kind of an addon to this topic: viewtopic.php?t=33535

-žarko
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17960
Joined: Mon Jan 12, 2009 8:07 am
Location: London

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by Tracker Supp-Stefan »

Hello zarkogajic,

I've passed your above enquiry to our devs working on the OCR engines - and we will post here a further update as soon as it's available!

Kind regards,
Stefan
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2353
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by Vasyl-Tracker Dev Team »

Its occured because we updated the Tesseract engine. The new Tesseract uses the *.traineddata files instead of the older *.dat. And seems both formats are incompatible, unfortunately. And yes, both x86 and x64 use the same lang-files as well.
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
zarkogajic
User
Posts: 1372
Joined: Thu Sep 05, 2019 12:35 pm

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by zarkogajic »

Hi Vasyl,

Thanks

Btw,
And seems both formats are incompatible, unfortunately.
What do you mean?

I've simply renamed the file extension and all seems to work.

-žarko
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17960
Joined: Mon Jan 12, 2009 8:07 am
Location: London

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by Tracker Supp-Stefan »

Hello zarkogajic,

I will ask Vasyl to clarify however if it works for you with just renaming the files - that's great!

Kind regards,
Stefan
zarkogajic
User
Posts: 1372
Joined: Thu Sep 05, 2019 12:35 pm

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by zarkogajic »

Ping :)

-žarko
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2353
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by Vasyl-Tracker Dev Team »

I will ask my colleagues about your tricky method. Still not sure it is correct to just rename old lang files...
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
User avatar
Vasyl-Tracker Dev Team
Site Admin
Posts: 2353
Joined: Thu Jun 30, 2005 4:11 pm
Location: Canada

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by Vasyl-Tracker Dev Team »

Our dev said that it was an upgrade of tesseract modules from v4 to newer v5. And newer tesseract uses different format for lang-files. Technically, the container is the same, but the data inside might be different. So it looks like the v5 tesseract is able to open and read those files, but there is a chance that some necessary data might be absent...
Vasyl Yaremyn
Tracker Software Products
Project Developer

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
zarkogajic
User
Posts: 1372
Joined: Thu Sep 05, 2019 12:35 pm

Re: (Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata  SOLVED

Post by zarkogajic »

Hi Vasyl,

Clear, thanks. Case closed.

-žarko
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17960
Joined: Mon Jan 12, 2009 8:07 am
Location: London

(Default) OCR in SDK, .dat files in Tesseract folder -> .traineddata

Post by Tracker Supp-Stefan »

:)