Page 1 of 1

OCR for mixed language documents

Posted: Thu Dec 20, 2012 2:09 pm
by cyberguy321
Hi there,

I would like to know is it possible for the OCR SDK to recognize a document with mixed languages like English + Traditional Chinese?

Regards,

Norman

Re: OCR for mixed language documents

Posted: Thu Dec 20, 2012 2:39 pm
by John - Tracker Supp
There are no current plans to provide this at this time I am afraid.

Re: OCR for mixed language documents

Posted: Sat Dec 22, 2012 9:30 am
by cyberguy321
Hi John,

So we can only recognize one language for a document at a time, correct? Does this language include the numeric characters?

Thanks.

Norman

Re: OCR for mixed language documents

Posted: Sun Dec 23, 2012 11:12 pm
by Paul - Tracker Supp
Hi cyberguy321

certainly Latin based languages should recognize Arabic numerals (1,2,3,4,5,6,7,8,9,0). Similarly I would expect Chinese to recognize the numeric kanji (is that the right word - I know it is in Japanese, not sure in Chinese) as it would any other word/character.

I haven't tried this and the OCR expert isn't available today but have you tried running the OCR twice, once in each language? Be sure to select "Preserve Original Content & Add Text layer" so you don't lose your previous results. I'd be keen to hear how that goes.

regards

Re: OCR for mixed language documents

Posted: Wed Sep 11, 2013 2:38 pm
by pyrrolidine
You should add this function, it is very important. All my scanned pdf documents are composed in two languages. I'm ready to buy the program with such possibility. But there are no alternatives.

Re: OCR for mixed language documents

Posted: Wed Sep 11, 2013 5:30 pm
by Walter-Tracker Supp
There is some limited support for recognition of mixed Chinese (traditional or simplified) with latin script. I would recommend you try the free PDF-XChange Viewer (from our downloads page) with the Chinese language package and try it out on some sample documents, as it uses the same underlying OCR engine and languages as our SDK (with one caveat; we do not have automatic deskew in the Viewer, which is available in the SDK. For best results use pages that are level already). Select one of the Chinese languages in the OCR options when you start an OCR job. Note that OCR with Chinese characters can take a little bit longer than Latin character recognition.