OCR of pdf and pictures
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
OCR of pdf and pictures
We bought Pro SDK license under CrimsonLogic Pte Ltd.
I have 3 problems now while doing OCR in my WPF application.
1) I am not able to OCR pdf with 17 pages and above.
2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
Please help to advise. Thank you very much.
I have 3 problems now while doing OCR in my WPF application.
1) I am not able to OCR pdf with 17 pages and above.
2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
Please help to advise. Thank you very much.
- Attachments
-
- pdf-xchange_screenshot.pdf
- (63.23 KiB) Downloaded 517 times
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: OCR of pdf and pictures
Hi,
Can we please keep all OCR related questions in one forum - or email please - you are posting in multiple forums and also then sending emails - which is not helpful and just divides the effort to assist you as we are having to check if some items have been answered in emails or other forums first ...
I will move this one to the OCR forums and any others - so we can address them all logically - thank you.
Can we please keep all OCR related questions in one forum - or email please - you are posting in multiple forums and also then sending emails - which is not helpful and just divides the effort to assist you as we are having to check if some items have been answered in emails or other forums first ...
I will move this one to the OCR forums and any others - so we can address them all logically - thank you.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: OCR of pdf and pictures
RE: Questions;
1) I am not able to OCR pdf with 17 pages and above.
Please advise what version of our products are being used, the spec of the hardware (processor, drive space and also Ram, OS) Also please provide an example of the PDF being OCR'd - could it be you are running out of resources ??? Perhaps try breaking the job into 'chunks'
2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
Please supply before/after PDF files for us to analyse along with a snippet of the code you are using for this specific task.
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
I have asked a colleague to help and advise on this specifically...
1) I am not able to OCR pdf with 17 pages and above.
Please advise what version of our products are being used, the spec of the hardware (processor, drive space and also Ram, OS) Also please provide an example of the PDF being OCR'd - could it be you are running out of resources ??? Perhaps try breaking the job into 'chunks'
2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
Please supply before/after PDF files for us to analyse along with a snippet of the code you are using for this specific task.
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
I have asked a colleague to help and advise on this specifically...
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi John,
1) I am not able to OCR pdf with 17 pages and above.
>> We bought the license of PDF Xchange PRO SDK
>> On your website it shows
**NEW OCR Module Included** - Now includes PDF-X OCR SDK Module for converting image based PDF files to fully text searchable PDF files at no charge. For more information on this exciting new module and usage requirements for the free new add-on please visit our PDF-X OCR SDK Module page
>> We are using this PDF-X OCR SDK.
>> machine : 8 GB ram, I7, 64Bit OS.
>> Attached the pdf of 17 pages where you can try to OCR and update us on the outcome.
>> (please note that this 17 pages PDF was converted from word doc as your forum does not allow upload)
>> (let us know if you need the word copy to email to you.)
>> please see the code below.
2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
>> Attached the pdf for your investigation. Please go through the pdf to see the issue.
>> ( Provide the program file on the OCR code)
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
>> This we will wait for your feedback.
>> The code for OCR pdf.
private string ConvertPDFToOCR(string m_SourceFilename, string m_DestFilename, string language)
{
string result = "OK";
IntPtr pdf;
int hResult;
string OCRretcode;
int m_DPI;
string m_Datapath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", "") + @"\OCRLanguages\";
PDFXOCR_Funcs.PXO_Language m_Language = (PDFXOCR_Funcs.PXO_Language)Array.IndexOf(PDFXOCR_Funcs.OCR_LangFullArrayW, language); //GetOCRLanguage(language);
string langinit = PDFXOCR_Funcs.OCR_LangArrayW[Array.IndexOf(PDFXOCR_Funcs.OCR_LangFullArrayW, language)];
// Check if language file exists
string langfile = m_Datapath + @"ocrdats\" + langinit + "_pxvocr.dat";// m_Datapath + @"ocrdats\eng_pxvocr.dat"; //OCR Language file
// string err = string.Empty;
try
{
if (!System.IO.File.Exists(langfile))
{
result += "Language File Missing";
}
m_DPI = 200; //quality of OCR
string regkey = "XXXXXXXXXXXXXXXXXXXXXXX";
string devcode = "XXXXXXXXXXXXXXXXXXXXXXX";
//string key = "YOUR PRODUCT KEY";
//string code = "YOUR DEVELOPER CODE";
hResult = PDFXOCR_Funcs.OCR_Init(out pdf, regkey, devcode);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "OCR Initialization failure.";
}
hResult = PDFXOCR_Funcs.OCR_SetCallback(pdf, thecallback, 0);
hResult = PDFXOCR_Funcs.OCR_LoadW(pdf, m_SourceFilename);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error loading file: \n" + m_SourceFilename + "OCR Library Error";
}
PDFXOCR_Funcs.PXO_Options Options = new PDFXOCR_Funcs.PXO_Options();
Options.blacklist = string.Empty;
Options.whitelist = string.Empty;
Options.raster_dpi = m_DPI;
Options.ImageFlags = (uint)PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_FastAutorotate;
Options.DataPath = m_Datapath;
Options.lang = m_Language;
Options.RegionMode = PDFXOCR_Funcs.OCR_RegionMode.OCR_Auto;
Options.reserved = 0;
IntPtr pxoPagelist = IntPtr.Zero; // null pointer passed to OCR_MakeSearchable() will result in all pages being OCRd.
hResult = PDFXOCR_Funcs.OCR_MakeSearchable(pdf, ref Options, pxoPagelist);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error running searchable.\nError code: " + hResult.ToString();
}
else
{
OCRretcode = hResult.ToString();
}
hResult = PDFXOCR_Funcs.OCR_SaveW(pdf, m_DestFilename);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error saving output PDF file.\nError code: " + hResult.ToString();
}
PDFXOCR_Funcs.OCR_Delete(out pdf);
}
catch (Exception ex)
{
//throw ex;
result += "[EXCEPTION]" + ex.GetType();
result += "[EXCEPTION]" + ex.Message;
result += "[EXCEPTION]" + ex.StackTrace;
//Dispose();
//result += "Disposed OCRHelper class";
}
return result;
}
>> The code of Convert Word to PDF
private bool ConvertToPDF(string pdfpath, string inputfile)
{
bool isDone = false;
PXCComLib5.CPXCPrinter PDFPrinter;
PXCComLib5.CPXCControlEx prnFactory = new PXCComLib5.CPXCControlEx();
string regkey = "XXXXXXXXXXXX";
string devcode = "XXXXXXXXXXXX";
PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = pdfpath;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite
PDFPrinter.SetAsDefaultPrinter();
System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit();
isDone = true;
return isDone;
}
1) I am not able to OCR pdf with 17 pages and above.
>> We bought the license of PDF Xchange PRO SDK
>> On your website it shows
**NEW OCR Module Included** - Now includes PDF-X OCR SDK Module for converting image based PDF files to fully text searchable PDF files at no charge. For more information on this exciting new module and usage requirements for the free new add-on please visit our PDF-X OCR SDK Module page
>> We are using this PDF-X OCR SDK.
>> machine : 8 GB ram, I7, 64Bit OS.
>> Attached the pdf of 17 pages where you can try to OCR and update us on the outcome.
>> (please note that this 17 pages PDF was converted from word doc as your forum does not allow upload)
>> (let us know if you need the word copy to email to you.)
>> please see the code below.
2) I notice that some successfully OCRed files have text overlaid as in attached screenshot. How can I fix it?
>> Attached the pdf for your investigation. Please go through the pdf to see the issue.
>> ( Provide the program file on the OCR code)
3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
>> This we will wait for your feedback.
>> The code for OCR pdf.
private string ConvertPDFToOCR(string m_SourceFilename, string m_DestFilename, string language)
{
string result = "OK";
IntPtr pdf;
int hResult;
string OCRretcode;
int m_DPI;
string m_Datapath = Path.GetDirectoryName(Assembly.GetExecutingAssembly().GetName().CodeBase).Replace("file:\\", "") + @"\OCRLanguages\";
PDFXOCR_Funcs.PXO_Language m_Language = (PDFXOCR_Funcs.PXO_Language)Array.IndexOf(PDFXOCR_Funcs.OCR_LangFullArrayW, language); //GetOCRLanguage(language);
string langinit = PDFXOCR_Funcs.OCR_LangArrayW[Array.IndexOf(PDFXOCR_Funcs.OCR_LangFullArrayW, language)];
// Check if language file exists
string langfile = m_Datapath + @"ocrdats\" + langinit + "_pxvocr.dat";// m_Datapath + @"ocrdats\eng_pxvocr.dat"; //OCR Language file
// string err = string.Empty;
try
{
if (!System.IO.File.Exists(langfile))
{
result += "Language File Missing";
}
m_DPI = 200; //quality of OCR
string regkey = "XXXXXXXXXXXXXXXXXXXXXXX";
string devcode = "XXXXXXXXXXXXXXXXXXXXXXX";
//string key = "YOUR PRODUCT KEY";
//string code = "YOUR DEVELOPER CODE";
hResult = PDFXOCR_Funcs.OCR_Init(out pdf, regkey, devcode);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "OCR Initialization failure.";
}
hResult = PDFXOCR_Funcs.OCR_SetCallback(pdf, thecallback, 0);
hResult = PDFXOCR_Funcs.OCR_LoadW(pdf, m_SourceFilename);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error loading file: \n" + m_SourceFilename + "OCR Library Error";
}
PDFXOCR_Funcs.PXO_Options Options = new PDFXOCR_Funcs.PXO_Options();
Options.blacklist = string.Empty;
Options.whitelist = string.Empty;
Options.raster_dpi = m_DPI;
Options.ImageFlags = (uint)PDFXOCR_Funcs.OCR_ImageProcessingFlags.OCR_Image_FastAutorotate;
Options.DataPath = m_Datapath;
Options.lang = m_Language;
Options.RegionMode = PDFXOCR_Funcs.OCR_RegionMode.OCR_Auto;
Options.reserved = 0;
IntPtr pxoPagelist = IntPtr.Zero; // null pointer passed to OCR_MakeSearchable() will result in all pages being OCRd.
hResult = PDFXOCR_Funcs.OCR_MakeSearchable(pdf, ref Options, pxoPagelist);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error running searchable.\nError code: " + hResult.ToString();
}
else
{
OCRretcode = hResult.ToString();
}
hResult = PDFXOCR_Funcs.OCR_SaveW(pdf, m_DestFilename);
if (PDFXOCR_Funcs.IS_DS_FAILED(hResult))
{
result += "Error saving output PDF file.\nError code: " + hResult.ToString();
}
PDFXOCR_Funcs.OCR_Delete(out pdf);
}
catch (Exception ex)
{
//throw ex;
result += "[EXCEPTION]" + ex.GetType();
result += "[EXCEPTION]" + ex.Message;
result += "[EXCEPTION]" + ex.StackTrace;
//Dispose();
//result += "Disposed OCRHelper class";
}
return result;
}
>> The code of Convert Word to PDF
private bool ConvertToPDF(string pdfpath, string inputfile)
{
bool isDone = false;
PXCComLib5.CPXCPrinter PDFPrinter;
PXCComLib5.CPXCControlEx prnFactory = new PXCComLib5.CPXCControlEx();
string regkey = "XXXXXXXXXXXX";
string devcode = "XXXXXXXXXXXX";
PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = pdfpath;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite
PDFPrinter.SetAsDefaultPrinter();
System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit();
isDone = true;
return isDone;
}
- Attachments
-
- ABST.PDF
- The overlay PDF
- (20.76 KiB) Downloaded 530 times
-
- test 17 pages and image-comment.pdf
- 17 pages PDF copy
- (109.91 KiB) Downloaded 494 times
- Lzcat - Tracker Supp
- Site Admin
- Posts: 677
- Joined: Thu Jun 28, 2007 8:42 am
Re: OCR of pdf and pictures
Hi.
HTH.
If you read help for PXC_PlaceImage function you can see that the last two parameters specify width and height of an image in points (1/72 inch). I cannot see code of your I2L function, so cannot say why you are getting such small images - because of the error in I2L or because 3 and 2 values are simply too small.3) When I convert image to pdf, the image size is quite small compared to original image. Where can I change the image size?
I’ve played around with the last 2 values in below line but I couldn’t manage to make the image bigger in pdf file.
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(3), Common.I2L(2));
HTH.
Victor
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
As for the error code - it means OCR_ERR_INVALID_DICT_PATH meaning that you gave wrong path to the dictionary folder.
Do use these for problem investigating in future:
HTH,
Alex
As for the error code - it means OCR_ERR_INVALID_DICT_PATH meaning that you gave wrong path to the dictionary folder.
Do use these for problem investigating in future:
Code: Select all
OCRCORE_API LONG OCR_API OCRE_Err_FormatSeverity(HRESULT errorcode, LPSTR buf, LONG maxlen);
OCRCORE_API LONG OCR_API OCRE_Err_FormatFacility(HRESULT errorcode, LPSTR buf, LONG maxlen);
OCRCORE_API LONG OCR_API OCRE_Err_FormatErrorCode(HRESULT errorcode, LPSTR buf, LONG maxlen);
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Sasha,
Sorry, don't quite understand. which error code you are referring to??
Thanks
Sorry, don't quite understand. which error code you are referring to??
Thanks
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
It's about the error code that you've asked about ERROR CODE – 2113263855 == 0x820A2711
HTH
It's about the error code that you've asked about ERROR CODE – 2113263855 == 0x820A2711
HTH
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
By the way - it would be better if you could provide a small sample project (with your dlls included) where the problems occur and the guide on how to reproduce them. Then we could help you more efficiently. Because right now there are many questions from our side which could be answered if we had a working project.
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Sasha,
We will email you a sample program and documents to try out via email (support@pdf-xchange.com) due to file size limitation in file attachment in this forum. We will send them in 2 separate emails. Thanks for your help.
We will email you a sample program and documents to try out via email (support@pdf-xchange.com) due to file size limitation in file attachment in this forum. We will send them in 2 separate emails. Thanks for your help.
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Sasha,
We've tried to send you the programs and sample files via email but failed to send due to the file size. Do you have any other alternative way to deposit our files? Thanks.
We've tried to send you the programs and sample files via email but failed to send due to the file size. Do you have any other alternative way to deposit our files? Thanks.
- John - Tracker Supp
- Site Admin
- Posts: 5219
- Joined: Tue Jun 29, 2004 10:34 am
- Location: United Kingdom
- Contact:
Re: OCR of pdf and pictures
How big are the attachments ?
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Best regards
Tracker Support
http://www.tracker-software.com
Best regards
Tracker Support
http://www.tracker-software.com
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Program file is about 25MB and sample files are about 4MB after zipping
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Please post them to google drive or dropbox and give us a link.
Cheers,
Alex
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Sasha,
Our client is a government agency and they prohibit us to upload their code to cloud due to security concern.
Please help us to provide a secured repository to upload the files. Thank you very much.
Our client is a government agency and they prohibit us to upload their code to cloud due to security concern.
Please help us to provide a secured repository to upload the files. Thank you very much.
- Tracker Supp-Stefan
- Site Admin
- Posts: 17941
- Joined: Mon Jan 12, 2009 8:07 am
- Location: London
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
Maybe you can upload the files to our ftp server?
You can find the details for it here:
https://www.pdf-xchange.com/knowledgebase/321
However as the FTP is open to anyone - we would recommend you to password protect the files uploaded, and then send us the password e.g. via e-mail to support@pdf-xchange.com
Regards,
Stefan
Maybe you can upload the files to our ftp server?
You can find the details for it here:
https://www.pdf-xchange.com/knowledgebase/321
However as the FTP is open to anyone - we would recommend you to password protect the files uploaded, and then send us the password e.g. via e-mail to support@pdf-xchange.com
Regards,
Stefan
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Stefan,
Thank you for your reply. We have uploaded the files and sent password in email.
Thank you for your reply. We have uploaded the files and sent password in email.
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
Thanks for the sample - we'll look at it.
Thanks for the sample - we'll look at it.
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Sasha,
Any updates??
Thanks
Any updates??
Thanks
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
Looking at your files in media.zip we've investigated this so far:
The DWC.pdf created had been already OCR'd by some external converter (libtiff / tiff2pdf - 2.3.606.0) with the text overlay that has invisible text.
When this file is OCR'd the text becomes visible and the background image + this text is going through our OCR engine. Thus you'll have the visible text (aligned by top in you example) and the OCR'd image background with the invisible text on top of it. Of course this text will be corrupted where it was overlayed with previously invisible text.
HTH,
Alex
Looking at your files in media.zip we've investigated this so far:
The DWC.pdf created had been already OCR'd by some external converter (libtiff / tiff2pdf - 2.3.606.0) with the text overlay that has invisible text.
When this file is OCR'd the text becomes visible and the background image + this text is going through our OCR engine. Thus you'll have the visible text (aligned by top in you example) and the OCR'd image background with the invisible text on top of it. Of course this text will be corrupted where it was overlayed with previously invisible text.
HTH,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
HI Sasha,
is it possible to know if the file has already been OCR when pass through PDF Xchange SDK?
Any updates on the other issue?
Thanks
fya
is it possible to know if the file has already been OCR when pass through PDF Xchange SDK?
Any updates on the other issue?
Thanks
fya
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
Maybe it's better to look at the pdf generator and it's options so that it won't generate any text?
Do you mean the 17 page problem as the other problem?
Cheers,
Alex
Maybe it's better to look at the pdf generator and it's options so that it won't generate any text?
Do you mean the 17 page problem as the other problem?
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
HI Sasha,
yes, we need the solution of the 17 pages error.
Thanks
fya
yes, we need the solution of the 17 pages error.
Thanks
fya
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
HI Sasha,
Don't understand your statement
Maybe it's better to look at the pdf generator and it's options so that it won't generate any text?
The PDF program given performs OCR which causes the overlay. What do you mean by the PDF generator??
The other issue is a word file, convert to PDF format and the OCR.
The convert to PDF format has no issue.
Where as the OCR process throws error.
Please try the program as we take effort to build to show the issue.
Please get the developer to look at the codes if you are not able to do so.
We need the solution ASAP as we are already reported the issues for over a week with no progress.
thanks
fya
Don't understand your statement
Maybe it's better to look at the pdf generator and it's options so that it won't generate any text?
The PDF program given performs OCR which causes the overlay. What do you mean by the PDF generator??
The other issue is a word file, convert to PDF format and the OCR.
The convert to PDF format has no issue.
Where as the OCR process throws error.
Please try the program as we take effort to build to show the issue.
Please get the developer to look at the codes if you are not able to do so.
We need the solution ASAP as we are already reported the issues for over a week with no progress.
thanks
fya
- Ivan - Tracker Software
- Site Admin
- Posts: 3550
- Joined: Thu Jul 08, 2004 10:36 pm
- Location: Vancouver Island - Canada
- Contact:
Re: OCR of pdf and pictures
As we already mentioned, the problem is because your process is 32-bit.yes, we need the solution of the 17 pages error.
32-bit processes have limited address space available, and, what is most important, in modern OSes Address Space Layout Randomization (https://en.wikipedia.org/wiki/Address_s ... domization) technology makes this address space highly fragmented and application often cannot allocate big continues buffer of memory (for example, one Letter page on 300 dpi requires about 32 Mb of memory on rasterization).
The only possible solutions I can recommend here:
1. create separate .exe that will OCR document and turn off ASLR for this .exe (not sure in .NET allows to do that)
2. convert your app to 64-bits.
HTH
Tracker Software (Project Director)
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
When attaching files to any message - please ensure they are archived and posted as a .ZIP, .RAR or .7z format - or they will not be posted - thanks.
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi,
As Alex said above, overlaid text is due to the pdf we use has been already OCRed. How can we know whether the pdf is already OCRed?
We have another problem in converting word file to pdf. Our code is as follow:
Firstly, we opened one word document (doc1.docx). Then, launch our application and upload another word document (doc2.docx) which will run below code to convert to PDF. Default printer is set to physical printer.
The below code still uses physical printer instead of using PDF-Xchange Printer. doc2.docx is printed out from physical printer instead of getting converted to PDF. Please advise us ASAP as this issue is stopping business flows for our live system.
PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = pdfpath;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite
PDFPrinter.SetAsDefaultPrinter();
System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit(60000);
PDFPrinter.RestoreDefaultPrinter();
As Alex said above, overlaid text is due to the pdf we use has been already OCRed. How can we know whether the pdf is already OCRed?
We have another problem in converting word file to pdf. Our code is as follow:
Firstly, we opened one word document (doc1.docx). Then, launch our application and upload another word document (doc2.docx) which will run below code to convert to PDF. Default printer is set to physical printer.
The below code still uses physical printer instead of using PDF-Xchange Printer. doc2.docx is printed out from physical printer instead of getting converted to PDF. Please advise us ASAP as this issue is stopping business flows for our live system.
PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = pdfpath;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite
PDFPrinter.SetAsDefaultPrinter();
System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "print";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit(60000);
PDFPrinter.RestoreDefaultPrinter();
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
We suspect that this is a Windows 10 issue.
Do try this - we've just tested this code and it worked for us:
HTH
We suspect that this is a Windows 10 issue.
Do try this - we've just tested this code and it worked for us:
Code: Select all
PXCComLib5.CPXCPrinter PDFPrinter;
PXCComLib5.CPXCControlEx prnFactory = new PXCComLib5.CPXCControlEx();
PDFPrinter = (PXCComLib5.CPXCPrinter)prnFactory.get_Printer("", "PDF-XChange Printer 2012", regkey, devcode);
PDFPrinter.Option["Save.ShowSaveDialog"] = false;
PDFPrinter.Option["Save.RunApp"] = false;
PDFPrinter.Option["Save.Path"] = ocrfile;
PDFPrinter.Option["Save.WhenExists"] = 1; //overwrite
System.Diagnostics.Process printJob = new System.Diagnostics.Process();
printJob.StartInfo.FileName = inputfile;
printJob.StartInfo.UseShellExecute = true;
printJob.StartInfo.Verb = "printto";
printJob.StartInfo.Arguments = "\"" + PDFPrinter.Name + "\"";
printJob.StartInfo.WindowStyle = System.Diagnostics.ProcessWindowStyle.Minimized;
printJob.Start();
printJob.WaitForExit(60000);
return "ok";
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Support,
I converted my application to 64bit according to Tracker's advice.
I am not able to convert image files to pdf. I've replaced all dlls from Bin.64 folders from Tracker Software\PDF-XChange PRO 5 SDK\Examples
Our code is as follows:
if (Common.IS_DS_FAILED(PDFXC_Funcs.PXC_NewDocument(out pdf, regkey, devcode)))
resultstr += "ConvertOthersToOCR: IS_DS_FAILED";
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Author, "Tracker Software");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Title, "PDF-XChange 4.0 Examples");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Creator, "PDF-XChange 4.0");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Keywords, "PDF-XChange; Examples; 4.0; C#");
PDFXC_Funcs.PXC_EnableLinkAnalyzer(pdf, true);
PDFXC_Funcs.PXC_SetCompression(pdf, false, false, PDFXC_Funcs.PXC_CompressionType.ComprType_C_Auto,
75, PDFXC_Funcs.PXC_CompressionType.ComprType_I_Auto, PDFXC_Funcs.PXC_CompressionType.ComprType_M_Auto);
int res = PDFXC_Funcs.PXC_AddPage(pdf, Common.PW, Common.PH, out page);
if (Common.IS_DS_FAILED(res))
resultstr += "ConvertOthersToOCR: " + res;
cpage = page;
double iw, ih;
res = PDFXC_Funcs.PXC_AddImageA(pdf, inputfile, out p);
if (Common.IS_DS_FAILED(res))
resultstr += "ConvertOthersToOCR: " + res;
PDFXC_Funcs.PXC_GetImageDimension(pdf, p, out iw, out ih);
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(7), Common.I2L(8));
PDFXC_Funcs.PXC_WriteDocumentExA(pdf, extractfile, extractfile.Length, fl, "");
PDFXC_Funcs.PXC_ReleaseDocument(pdf);
I am getting this error code -2113667071 from below line and no pdf is generated.
res = PDFXC_Funcs.PXC_AddImageA(pdf, inputfile, out p);
Please advise.
Thank you very much.
I converted my application to 64bit according to Tracker's advice.
I am not able to convert image files to pdf. I've replaced all dlls from Bin.64 folders from Tracker Software\PDF-XChange PRO 5 SDK\Examples
Our code is as follows:
if (Common.IS_DS_FAILED(PDFXC_Funcs.PXC_NewDocument(out pdf, regkey, devcode)))
resultstr += "ConvertOthersToOCR: IS_DS_FAILED";
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Author, "Tracker Software");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Title, "PDF-XChange 4.0 Examples");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Creator, "PDF-XChange 4.0");
PDFXC_Funcs.PXC_SetDocumentInfoA(pdf, PDFXC_Funcs.PXC_StdInfoField.InfoField_Keywords, "PDF-XChange; Examples; 4.0; C#");
PDFXC_Funcs.PXC_EnableLinkAnalyzer(pdf, true);
PDFXC_Funcs.PXC_SetCompression(pdf, false, false, PDFXC_Funcs.PXC_CompressionType.ComprType_C_Auto,
75, PDFXC_Funcs.PXC_CompressionType.ComprType_I_Auto, PDFXC_Funcs.PXC_CompressionType.ComprType_M_Auto);
int res = PDFXC_Funcs.PXC_AddPage(pdf, Common.PW, Common.PH, out page);
if (Common.IS_DS_FAILED(res))
resultstr += "ConvertOthersToOCR: " + res;
cpage = page;
double iw, ih;
res = PDFXC_Funcs.PXC_AddImageA(pdf, inputfile, out p);
if (Common.IS_DS_FAILED(res))
resultstr += "ConvertOthersToOCR: " + res;
PDFXC_Funcs.PXC_GetImageDimension(pdf, p, out iw, out ih);
PDFXC_Funcs.PXC_PlaceImage(cpage, p, Common.I2L(1), Common.PH - Common.I2L(1), Common.I2L(7), Common.I2L(8));
PDFXC_Funcs.PXC_WriteDocumentExA(pdf, extractfile, extractfile.Length, fl, "");
PDFXC_Funcs.PXC_ReleaseDocument(pdf);
I am getting this error code -2113667071 from below line and no pdf is generated.
res = PDFXC_Funcs.PXC_AddImageA(pdf, inputfile, out p);
Please advise.
Thank you very much.
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
Please do not post error codes only - use PXC_Err_FormatErrorCode method.
The error code that you've provided means Invalid Argument.
The code sample does not contain enough information for that method.
Please provide samples with FULL problem data.
Please do not post error codes only - use PXC_Err_FormatErrorCode method.
The error code that you've provided means Invalid Argument.
The code sample does not contain enough information for that method.
Please provide samples with FULL problem data.
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 38
- Joined: Tue Jan 12, 2016 2:25 am
Re: OCR of pdf and pictures
Hi Sasha,
We are uploading sample project (TestPDFXChangeORG.zip) to Tracker's FTP . Please unzip with the password sent in a separate email to 'support@pdf-xchange.com'
The sample data file (CL.TIF) is in Temp.zip.
Please advise how we can use PXC_Err_FormatErrorCode in our program too.
Thank you very much.
We are uploading sample project (TestPDFXChangeORG.zip) to Tracker's FTP . Please unzip with the password sent in a separate email to 'support@pdf-xchange.com'
The sample data file (CL.TIF) is in Temp.zip.
Please advise how we can use PXC_Err_FormatErrorCode in our program too.
Thank you very much.
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
How to use FormatErrorCode method:
Please post the error message with the error code itself when you need to include it in your message.
Cheers,
Alex
Code: Select all
byte[] bytes = new byte[128 * sizeof(char)];
PDFXC_Funcs.PXC_Err_FormatErrorCode(-2113667071, bytes, bytes.Length);
string str = System.Text.Encoding.ASCII.GetString(bytes);
Cheers,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
-
- User
- Posts: 5522
- Joined: Fri Nov 21, 2014 8:27 am
- Contact:
Re: OCR of pdf and pictures
Hello crimsonlogic,
I've updated the zip archive ClassLibrary1.zip with the same password that you've specified.
The problem was in the int type - C# understands int as the 32 bit value thus when you switched to the x64 the pointers that were used became corrupted. I've modified them to IntPtr and it all worked properly.
In the archive there are files that I modified.
HTH,
Alex
I've updated the zip archive ClassLibrary1.zip with the same password that you've specified.
The problem was in the int type - C# understands int as the 32 bit value thus when you switched to the x64 the pointers that were used became corrupted. I've modified them to IntPtr and it all worked properly.
In the archive there are files that I modified.
HTH,
Alex
Subscribe at:
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ
https://www.youtube.com/channel/UC-TwAMNi1haxJ1FX3LvB4CQ