PDF OCR

PDF-XChange Editor SDK for Developers

Moderators: TrackerSupp-Daniel, Tracker Support, Paul - Tracker Supp, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Ivan - Tracker Software, Tracker Supp-Stefan

Forum rules
DO NOT post your license/serial key, or your activation code - these forums, and all posts within, are public and we will be forced to immediately deactivate your license.

When experiencing some errors, use the IAUX_Inst::FormatHRESULT method to see their description and include it in your post along with the error code.
Post Reply
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

PDF OCR

Post by HomerWu »

Hello,

I'm working on OCRPages with the operation "op.document.OCRPages",however,I encounter some trouble druing the program was running.Which one can help me?the code i using is copied from 'http://sdkhelp.trackersoftware.com/view ... t_OCRPages' and the attachment is the error info and source code.
Attachments
Source code and Error info.rar
(1.77 MiB) Downloaded 98 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

It seems that you haven't loaded the OCR plugin.
Please read this topic:
https://forum.pdf-xchange.com/ ... 913#p97913
And don't forget about the PluginsData folder and within it there should be the OCRLanguages folder with dictionaries for the correct OCR work.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Thanks for your reply,the operation "op.document.OCRPages" can work now.
But it seems like i can not do OCR on a image file,i mean i have opened a image file and then do OCR,after the operation finished,i callouted the search panel,but it returns no result.i have put the PluginsData folder in the the same folder where the PDFXEditCore.dll lies,but it seems that it have no effect.
I have opened the image file with your PDF-XChange Editor,and then excuted the cmd "OCR Pages",after that i callout the search panel,it returns the results i searched.
the attachment is my sourcode and the image file is the screenshot on your PDF-XChange Editor.which one can help me,w want to o OCR on image files.
Attachments
PDF-XChange Edit result.rar
(231.48 KiB) Downloaded 92 times
PluginsData.rar
(2.86 MiB) Downloaded 91 times
PDF-XchangeDemo_simple.rar
(4.33 MiB) Downloaded 91 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

Please read more carefully:
within it there should be the OCRLanguages folder with dictionaries for the correct OCR work.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Dear Sasha,
Yes,i have put the language folder into the dictionary :P ,but when i upload my attachment i found that the attachment was too large,so i put the language files into PluginsData.rar,you should download the PluginsData.rar and put it in my project,thanks a lot.
Attachments
OCRLanguagesFiles.rar
(56.3 KiB) Downloaded 96 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

I've opened your project, switched to x86, copied the needed files to the x86\Debug directory from your Debug directory. And done the OCR. Here's my results:
Image

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Hi Sasha,
Thanks for your reply quickly,i have tried a lot after your reply,but it still doesn't work :cry:
do you have any install other tracker software on your machine?I wonder whether the result related to the environment of the machine,because we only buy your PDF-XChange.the attachment is the screenshots of my x86\Debug directory,Is there any difference between this and yours?
Attachments
x86 debug.rar
(62.87 KiB) Downloaded 93 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

What version of the Editor SDK are you using?

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Hi Alex,
The version of the Editor SDK is 6.0.318.1,we have already buied your license last year.could you please help me solve the problem of the OCR ?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

Please try using the latest version and see whether the problem reoccurs.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Hello Sasha,


I have just tried OCR function again, I found that when i put the plugins data folder into the same folder where the PDFXEditCore.dll lies.it works well.
I wonder if my operation is correct?because i was put them into the debug folder last time i posted the topic.and it can not work.
Attachments
screenshot.png
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

The Plugins.xXX, PluginsData, Languages folders, along with the Resources.dat file need to be in the same folder where the PDFXEditCore.xXX.dll lies.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Hello Sasha,

Thank you very much.so my operation is correct?
and,as i m a developer,If i want to install my software on other machines which has not install PDF-XChange Editor SDK,how should i do?just copy the PDFXEditcore.XXX.dll to the target folder?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello HomerWu,

Judging from my previous reply, your code worked OK for me.
When deploying a build, you will need to have the PDFXEditCore.xXX.dll file along with the
Plugins.xXX, PluginsData, Languages folders, along with the Resources.dat file need to be in the same folder where the PDFXEditCore.xXX.dll lies.
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
HomerWu
User
Posts: 91
Joined: Fri Nov 25, 2016 8:19 am

Re: PDF OCR

Post by HomerWu »

Tnank you Sasha,i will have a try.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

:)
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: PDF OCR

Post by DolphinMann »

I have another OCR question. I was able to following along the two threads linked but then found this help file: https://sdkhelp.pdf-xchange.com/vie ... es_Options

It seems that OutputType = 0 for all possible options. What is the correct value to leave the current PDF as it is and add another text layer with the results?

Edit: Nevermind. I think I found the language packs available for download, but could still use some help to locate a list of available plugins if possible.

Also, when I execute the Sample Program with the OCR files in place, nothing happens. No error but not results either. I am using the Editor SDK 6.0.322.4
My instincts tell me the problem is that I am missing language files, but they are not even present in the sample program I downloaded. Is there a program that would include those or is there a way to determine where it is looking?
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello DomphinMann,

Check out this topic:
https://forum.pdf-xchange.com/ ... 66&t=25056

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: PDF OCR

Post by DolphinMann »

Thank you. I have so many versions of things installed I had to dig through the registry to determine where it was actually registered to.
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: PDF OCR

Post by DolphinMann »

Thank you. I was able to get the demo program working but I am a little confused by this statement from the other thread:
Hello docu-track99,

The problem is that you did not include the OCR plugin in your code.
Here's the code that needs to be evaluated for the correct work of the OCR (also read the comments provided):
Code: Select all
//Note that plugins must be loaded before the initialization of the control after you've initialized Instance
Inst.StartLoadingPlugins();
//Also, the project should contain the PluginsData folder and within it there should be the OCRLanguages folder with dictionaries for the correct command work
Inst.AddPluginFromFile("..\\Plugins\\OCRPlugin.pvp");
Inst.FinishLoadingPlugins();


Cheers,
Alex
What if I do not have a PDF Control? I am just trying to execute this in the background...this is my code.

Also still a question on: https://sdkhelp.pdf-xchange.com/vie ... es_Options
It says zero for both leaving the PDF and adding a text layer and creating an image pdf with text layer on top.

Code: Select all

                        viewerInstance = new PXV_Inst();
                        viewerInstance.Init(null, DolphinCorePDF.devKey);

                        string pluginLoadPath = Environment.GetEnvironmentVariable("DolphinPath");
                        viewerInstance.StartLoadingPlugins();
                        viewerInstance.AddPluginFromFile(pluginLoadPath + @"OCRPlugin.pvp");
                        viewerInstance.AddPluginFromFile(pluginLoadPath + @"ConvertPDF.pvp");
                        viewerInstance.FinishLoadingPlugins();
                        auxInst = (IAUX_Inst)viewerInstance.GetExtension("AUX");
                        
                                                IPXC_Inst pxcInst = (IPXC_Inst)viewerInstance.GetExtension("PXC");
                        IPXC_Document doc = pxcInst.OpenDocumentFromFile(inputPDF, clbk);

                        int nID = viewerInstance.Str2ID("op.document.OCRPages", false);
                        PDFXEdit.IOperation Op = viewerInstance.CreateOp(nID);
                        PDFXEdit.ICabNode input = Op.Params.Root["Input"];
                        input.v = doc;
                        PDFXEdit.ICabNode options = Op.Params.Root["Options"];
                        options["PagesRange.Type"].v = "All"; //OCR all pages
                        options["OutputType"].v = 0;
                        options["OutputDPI"].v = 300;
                        Op.Do();

                        doc.Close();
                        options.Clear();
                        input.Clear();
No errors, but no results. The demo program does work and the only difference I could find was some strange double creation of the IPXV_Inst, once to load plugins and another time to execute.
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: PDF OCR

Post by DolphinMann »

New information. I took the sample GUI app which works, and switched it to be a console application(code below), which now does not work, however I am getting more information and it is something related to languages:

Error:
Error opening data file: <my path to PDFXEditCore.x86.dll>./osd_pxvocr.dat
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed Loading language 'osd'
Tesseract couldn't load any languages!
Warning: Auto orientation and script requested, but osd language failed to load
Console App that fails:

Code: Select all

            string licKeyFormal = @"somelicensevalue";
            string licKey = licKeyFormal;
            AuthCallback clbk = new AuthCallback();

        PDFXEdit.IPXV_Inst Inst;
            Inst = new PXV_Inst();
            Inst.Init(null, licKeyFormal);
            string pluginLoadPath = Environment.GetEnvironmentVariable("DolphinPath");
            Inst.StartLoadingPlugins();
            Inst.AddPluginFromFile(pluginLoadPath + @"OCRPlugin.pvp");
            Inst.AddPluginFromFile(pluginLoadPath + @"ConvertPDF.pvp");
            Inst.FinishLoadingPlugins();
            IPXC_Inst pxcInst = (IPXC_Inst)Inst.GetExtension("PXC");
            IPXC_Document Doc = pxcInst.OpenDocumentFromFile(@"C:\Users\mark.mann\Desktop\test\Destec Pencils.pdf", clbk);
            if (Doc == null)
            {
                return;
            }
            int nID = Inst.Str2ID("op.document.OCRPages", false);
            PDFXEdit.IOperation Op = Inst.CreateOp(nID);
            PDFXEdit.ICabNode input = Op.Params.Root["Input"];
            input.v = Doc;
            PDFXEdit.ICabNode options = Op.Params.Root["Options"];
            options["PagesRange.Type"].v = "All";
            options["OutputType"].v = 0;
            options["OutputDPI"].v = 300;
            Inst.AsyncDoAndWaitForFinish(Op);
            Doc.Close();
I attached the code with the languages and plugin files removed since they caused the zip to be too large.
Attachments
PDF-XchangeDemo_simple.zip
(934.68 KiB) Downloaded 61 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

I've reproduced the problem and told this to the programmer who implemented the OCR - he will look into this why the osd is being used here.
Meanwhile, I've found a solution for now - please download this archive:
https://www.pdf-xchange.com/downloa ... rt_OCR.zip
Take the osd_pxvocr.dat file from it and put into the PluginsData\OCRLanguages folder - that should work.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: PDF OCR

Post by DolphinMann »

Thank you for the response. It does seem to be working as I am now getting OCR results.

Could you still please help me with the Options section though? Link: https://sdkhelp.pdf-xchange.com/vie ... es_Options

Edit: Ok, with the same example we have been working with I can 100% confirm that the file you provided fixed OCR and I now get proper results. However if I execute a Doc.WriteToFile command once OCR is complete(essentially save the file) I end up with the following:

1 PDF
if I view the content I get:
-Original Image
-New Image which is basically a copy of the original
-A bunch of text content


Is there a way to NOT have the image content double other than taking the Page.GetText elements or content itself and manually copying it from one PDF to another? Basically I want to JUST add the text, not any other type of content as that would already be on the form. Example Uploaded. Original is Destect Pencils and the one after Op.Document.OCRPages is Destect Pencils 2
Attachments
Destec Pencils.rar
(12.05 KiB) Downloaded 54 times
Destec Pencils2.rar
(127.66 KiB) Downloaded 54 times
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

Hello DolphinMann,

We know about this problem with the OCR - it exists in the End-User Editor too - we will try to fix this until the next release.

Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
DolphinMann
User
Posts: 158
Joined: Mon Aug 04, 2014 7:34 pm

Re: PDF OCR

Post by DolphinMann »

Thank you.
Sasha - Tracker Dev Team
User
Posts: 5522
Joined: Fri Nov 21, 2014 8:27 am
Contact:

Re: PDF OCR

Post by Sasha - Tracker Dev Team »

:)
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
Post Reply