Hi there,
I would like to know is it possible for the OCR SDK to recognize a document with mixed languages like English + Traditional Chinese?
Regards,
Norman
OCR for mixed language documents
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
Re: OCR for mixed language documents
There are no current plans to provide this at this time I am afraid.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
-
- User
- Posts: 2
- Joined: Thu Dec 20, 2012 1:56 pm
Re: OCR for mixed language documents
Hi John,
So we can only recognize one language for a document at a time, correct? Does this language include the numeric characters?
Thanks.
Norman
So we can only recognize one language for a document at a time, correct? Does this language include the numeric characters?
Thanks.
Norman
-
- Site Admin
- Posts: 6903
- Joined: Wed Mar 25, 2009 10:37 pm
- Location: Chemainus, Canada
Re: OCR for mixed language documents
Hi cyberguy321
certainly Latin based languages should recognize Arabic numerals (1,2,3,4,5,6,7,8,9,0). Similarly I would expect Chinese to recognize the numeric kanji (is that the right word - I know it is in Japanese, not sure in Chinese) as it would any other word/character.
I haven't tried this and the OCR expert isn't available today but have you tried running the OCR twice, once in each language? Be sure to select "Preserve Original Content & Add Text layer" so you don't lose your previous results. I'd be keen to hear how that goes.
regards
certainly Latin based languages should recognize Arabic numerals (1,2,3,4,5,6,7,8,9,0). Similarly I would expect Chinese to recognize the numeric kanji (is that the right word - I know it is in Japanese, not sure in Chinese) as it would any other word/character.
I haven't tried this and the OCR expert isn't available today but have you tried running the OCR twice, once in each language? Be sure to select "Preserve Original Content & Add Text layer" so you don't lose your previous results. I'd be keen to hear how that goes.
regards
Best regards
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
-
- User
- Posts: 1
- Joined: Wed Sep 11, 2013 2:34 pm
Re: OCR for mixed language documents
You should add this function, it is very important. All my scanned pdf documents are composed in two languages. I'm ready to buy the program with such possibility. But there are no alternatives.
-
- User
- Posts: 381
- Joined: Mon Jun 13, 2011 5:10 pm
Re: OCR for mixed language documents
There is some limited support for recognition of mixed Chinese (traditional or simplified) with latin script. I would recommend you try the free PDF-XChange Viewer (from our downloads page) with the Chinese language package and try it out on some sample documents, as it uses the same underlying OCR engine and languages as our SDK (with one caveat; we do not have automatic deskew in the Viewer, which is available in the SDK. For best results use pages that are level already). Select one of the Chinese languages in the OCR options when you start an OCR job. Note that OCR with Chinese characters can take a little bit longer than Latin character recognition.