OCR for mixed language documents

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
cyberguy321
User
Posts: 2
Joined: Thu Dec 20, 2012 1:56 pm

OCR for mixed language documents

Post by cyberguy321 »

Hi there,

I would like to know is it possible for the OCR SDK to recognize a document with mixed languages like English + Traditional Chinese?

Regards,

Norman
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: OCR for mixed language documents

Post by John - Tracker Supp »

There are no current plans to provide this at this time I am afraid.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
cyberguy321
User
Posts: 2
Joined: Thu Dec 20, 2012 1:56 pm

Re: OCR for mixed language documents

Post by cyberguy321 »

Hi John,

So we can only recognize one language for a document at a time, correct? Does this language include the numeric characters?

Thanks.

Norman
User avatar
Paul - Tracker Supp
Site Admin
Posts: 6835
Joined: Wed Mar 25, 2009 10:37 pm
Location: Chemainus, Canada
Contact:

Re: OCR for mixed language documents

Post by Paul - Tracker Supp »

Hi cyberguy321

certainly Latin based languages should recognize Arabic numerals (1,2,3,4,5,6,7,8,9,0). Similarly I would expect Chinese to recognize the numeric kanji (is that the right word - I know it is in Japanese, not sure in Chinese) as it would any other word/character.

I haven't tried this and the OCR expert isn't available today but have you tried running the OCR twice, once in each language? Be sure to select "Preserve Original Content & Add Text layer" so you don't lose your previous results. I'd be keen to hear how that goes.

regards
Best regards

Paul O'Rorke
Tracker Support North America
http://www.tracker-software.com
pyrrolidine
User
Posts: 1
Joined: Wed Sep 11, 2013 2:34 pm

Re: OCR for mixed language documents

Post by pyrrolidine »

You should add this function, it is very important. All my scanned pdf documents are composed in two languages. I'm ready to buy the program with such possibility. But there are no alternatives.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: OCR for mixed language documents

Post by Walter-Tracker Supp »

There is some limited support for recognition of mixed Chinese (traditional or simplified) with latin script. I would recommend you try the free PDF-XChange Viewer (from our downloads page) with the Chinese language package and try it out on some sample documents, as it uses the same underlying OCR engine and languages as our SDK (with one caveat; we do not have automatic deskew in the Viewer, which is available in the SDK. For best results use pages that are level already). Select one of the Chinese languages in the OCR options when you start an OCR job. Note that OCR with Chinese characters can take a little bit longer than Latin character recognition.
Post Reply