OCR for mixed language documents

cyberguy321 · Post by **cyberguy321** » Thu Dec 20, 2012 2:09 pm

Hi there,

I would like to know is it possible for the OCR SDK to recognize a document with mixed languages like English + Traditional Chinese?

Regards,

Norman

Post by **John - Tracker Supp** » Thu Dec 20, 2012 2:39 pm

There are no current plans to provide this at this time I am afraid.

cyberguy321 · Post by **cyberguy321** » Sat Dec 22, 2012 9:30 am

Hi John,

So we can only recognize one language for a document at a time, correct? Does this language include the numeric characters?

Thanks.

Norman

Sun Dec 23, 2012 11:12 pm

Hi cyberguy321

certainly Latin based languages should recognize Arabic numerals (1,2,3,4,5,6,7,8,9,0). Similarly I would expect Chinese to recognize the numeric kanji (is that the right word - I know it is in Japanese, not sure in Chinese) as it would any other word/character.

I haven't tried this and the OCR expert isn't available today but have you tried running the OCR twice, once in each language? Be sure to select "Preserve Original Content & Add Text layer" so you don't lose your previous results. I'd be keen to hear how that goes.

regards

pyrrolidine · Post by **pyrrolidine** » Wed Sep 11, 2013 2:38 pm

You should add this function, it is very important. All my scanned pdf documents are composed in two languages. I'm ready to buy the program with such possibility. But there are no alternatives.

Walter-Tracker Supp · Post by **Walter-Tracker Supp** » Wed Sep 11, 2013 5:30 pm

There is some limited support for recognition of mixed Chinese (traditional or simplified) with latin script. I would recommend you try the free PDF-XChange Viewer (from our downloads page) with the Chinese language package and try it out on some sample documents, as it uses the same underlying OCR engine and languages as our SDK (with one caveat; we do not have automatic deskew in the Viewer, which is available in the SDK. For best results use pages that are level already). Select one of the Chinese languages in the OCR options when you start an OCR job. Note that OCR with Chinese characters can take a little bit longer than Latin character recognition.

OCR for mixed language documents

OCR for mixed language documents

Re: OCR for mixed language documents

Re: OCR for mixed language documents

Re: OCR for mixed language documents

Re: OCR for mixed language documents

Re: OCR for mixed language documents