OCR - multilanguage use

Discussion for the End User use of OCR in PDF-XChange Editor and Viewer

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Post Reply
kejos
User
Posts: 5
Joined: Tue Feb 28, 2012 2:57 pm

OCR - multilanguage use

Post by kejos »

Hi,
is there any possibility to recognize two or more language text in the same time?

For example, if on one page of document there are Thai language and it's translation in French. Could I convert these texts in one time?

Greets
kejos
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17810
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR - multilanguage use

Post by Tracker Supp-Stefan »

Hello Kejos,

I am afraid that this is not currently possible.
If you have the Thai and the French translations on separate pages - then you can easily OCR only the needed pages in a specific language, but it's not possible to tell our OCR to work only with e.g. half a page.

A possbile workaround solution is to duplicate this page - and then cover one of the languages with a white rectangle on the first copy of the page, and the other language on the second copy - then OCR them separately with the appropriate language selected in the OCR tool.

Best,
Stefan
kejos
User
Posts: 5
Joined: Tue Feb 28, 2012 2:57 pm

Re: OCR - multilanguage use

Post by kejos »

Thanks Stefan!
kejos
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17810
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR - multilanguage use

Post by Tracker Supp-Stefan »

:)
Ludwig
User
Posts: 17
Joined: Sun Feb 24, 2013 1:52 pm

Re: OCR - multilanguage use

Post by Ludwig »

Hi there,
I just want to support Kejos' concern. It would be great if more than one language could be recognised at one scanning. I would prefer the option of ocr-ing a document in several languages simultaneously rather than telling the programm which parts should be ocr-ed in which language. By this I mean I would like to have the option of enabling two or three languages before ocr-ing so every single word can be checked in these languages and at the end a text layer is added in the language this certain word is most likely to be part of. Of course the ocr-ing itself will take twice or three times as long as usual. Very often I have documents with two languages on one page.
Please see the sample file with (Ancient)Greek and German: I would like "καὶ ἄρχοντα" to be recognised as "και αρχοντα" and not as "kai apxovta" (as the correct transcription would be "kai archonta") whereas the German parts should recognised as such.

Very often I also have scans of bilingual books where two pages are one one (landscape) page then. This means on the left side is French for example and on the right English. In this case my suggested option of scanning simultaneously would be more helpful.

Best regards,
Ludwig
Sample.pdf
(1.3 MiB) Downloaded 297 times
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17810
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: OCR - multilanguage use

Post by Tracker Supp-Stefan »

Hello Ludwig,

Automatically recognizing different languages especially if they are using similar or worse - the same alphabet could be quite tricky, and I can't make any promises that it will be available. As mentioned before (in another topic I believe) - we are considering an option to allow you to specify zones to be OCRed and selecting a specific (but only one) language for that zone.

Regards,
Stefan
Post Reply