PDF library crash

PDF-X OCR SDK is a New product from us and intended to compliment our existing PDF and Imaging Tools to provide the Developer with an expanding set of professional tools for Optical Character Recognition tasks

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

I noticed that the date has changed (to today), but the live version of the OCR .dll is still reporting itself as 1.0.4, not 1.0.5.

Please advise.

Thanks!

Shaun
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: PDF library crash

Post by John - Tracker Supp »

Hi Shaun the SDK has been updated as previously described and the Live DLL download should be up in an hour or so - I would do it now - but unfortunately Walter only sent me the Demo DLL included in the SDK full install - so will have to await his arrival this morning before the live DLL can be 'refreshed'

Cheers
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: PDF library crash

Post by John - Tracker Supp »

Hi Shaun,

its up now - so if you give it 30 mins or so for our Could servers to Synch all should be well - or download direct from here now :

www.docu-track.co.uk/PDFX_OCR_SDK_LIVE.zip

Cheers
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

Hello,

I've downloaded the latest version of the live .dll, and it works fine in most of my tests, but in one of the tests, I'm getting an error 10007, which I don't see documented in the header file. Any idea what's going on here?

I've narrowed the problem down to the machine that the test are running on. On one of our test machines, everything works fine. On the other, all of the tests we run seem to fail.

Thanks!

Shaun
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: PDF library crash

Post by Tracker Supp-Stefan »

Hello Shaun,

This might be quite obvious - but have you double checked that all the files are updated to the latest build on the failing test machine, and if they are - have you come up with any ideas why this machine might be failing - are there any hardware/software differences between the failing and non-failing ones?

Best,
Stefan
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

scdawson wrote:Hello,

I've downloaded the latest version of the live .dll, and it works fine in most of my tests, but in one of the tests, I'm getting an error 10007, which I don't see documented in the header file. Any idea what's going on here?

I've narrowed the problem down to the machine that the test are running on. On one of our test machines, everything works fine. On the other, all of the tests we run seem to fail.

Thanks!

Shaun
Any operating system or hardware architecture differences of note (XP vs Vista vs Win 7, 32 vs 64 bit)? Significant memory differences between them?

What about free disk space? Some of the PDF and image-related functions used internally cache to disk. I have a suspicion that this may be the problem.

I have looked up this error code and I cannot find it in either our error codes or in the windows error code lookup tool. Is this hex or decimal? Is it negative or positive? Is it the complete error code or just, e.g., the low word?
scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

The error code is just the low word (i.e. after calling DS_GET_ECODE).

I finally found ocr_errors.h, and it looks like this might be:

#define OCR_ERR_INTERNAL OCR_MAKE_ERROR(DS_MAX_COMMON_ERROR_CODE + 8)

(DS_MAX_COMMON_ERROR_CODE is 9999).

The operating systems are completely different (Win 7 in the working case vs. XP), as well as the hardware specs are quite a bit different as well.

I have some additional info now, though.

I installed an upgraded PDFTools library from the latest build, and that fixed the problem. Part of what threw me was that I never used PDFTools on that machine, it had just been installed ages ago. My guess is that the new changes in the OCR library required some of the functionality of the newer PDFTools, whereas the old ocr library worked fine with the older version.

Unfortunately, I didn't pay attention to which version I overwrote when I installed the new version, so I don't have any more information on that.

In any event, the problem went away when I installed the latest PDFTools, as recommended earlier.

Thanks!

Shaun
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

The ocrtools dll functions in a stand-alone manner although of course we re-use some of the PDF TOOLS functionality internally (static linkage - no other DLLs required). All I can think of is that one of the examples used the PDF Viewer SDK. Were you using this example by any chance?

-Walter

scdawson wrote:The error code is just the low word (i.e. after calling DS_GET_ECODE).

I finally found ocr_errors.h, and it looks like this might be:

#define OCR_ERR_INTERNAL OCR_MAKE_ERROR(DS_MAX_COMMON_ERROR_CODE + 8)

(DS_MAX_COMMON_ERROR_CODE is 9999).

The operating systems are completely different (Win 7 in the working case vs. XP), as well as the hardware specs are quite a bit different as well.

I have some additional info now, though.

I installed an upgraded PDFTools library from the latest build, and that fixed the problem. Part of what threw me was that I never used PDFTools on that machine, it had just been installed ages ago. My guess is that the new changes in the OCR library required some of the functionality of the newer PDFTools, whereas the old ocr library worked fine with the older version.

Unfortunately, I didn't pay attention to which version I overwrote when I installed the new version, so I don't have any more information on that.

In any event, the problem went away when I installed the latest PDFTools, as recommended earlier.

Thanks!

Shaun
scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

No, I was using our program, which makes use of the ocr_tools.dll directly.

We've started seeing this on another machine now, with the deployed version of our program, and this time, reinstalling PDF Exchange PRO is not an option. I did copy all of the latest versions of the .dlls to the target machine and restarted the machine, but every time we try to make searchable (in our program), we get the 10007 error.

Thanks!

Shaun
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

scdawson wrote:No, I was using our program, which makes use of the ocr_tools.dll directly.

We've started seeing this on another machine now, with the deployed version of our program, and this time, reinstalling PDF Exchange PRO is not an option. I did copy all of the latest versions of the .dlls to the target machine and restarted the machine, but every time we try to make searchable (in our program), we get the 10007 error.

Thanks!

Shaun
Are you able to report the full error code? Have you checked disk space? How big are the documents? Can you provide an example to our support email box?
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

I have a suggestion for you in the meantime, as we sort this out. The default behaviour when auto rotation fails is to abort the job and return an error code. This was a choice made purposely to make it clear that things did not proceed as expected. However we are finding that there are certain types of page (e.g. blank ones with noise on them, or pure images) for which auto-rotation often fails. In these cases the preferred behaviour would be to ignore the auto rotation failure and proceed with OCR anyway, but return a warning. This will be the case in the next build (version 1.0.6+ of the DLL). However for now a workaround is to turn off auto-rotation and try again (ie, don't use the OCR_Image_Autorotate) flag.

Could you try this and let me know if it helps?

-Walter
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

A new build, which resolves the behaviour whereby OCR jobs are aborted if a page cannot be auto-rotated, is going up shortly. I would recommend downloading the main installer to update your header files as there is a new warning code, OCR_WRN_NOTROTATED, that indicates an auto-rotation (deskew) was not performed on one (or more) pages during a job.

I believe the live DLL is now up, but you will probably want to get the headers from the main installer as well.

The new version is 1.0.6 and all subsequent versions will behave in this way, instead of the previous way.

This should resolve a number of scenarios where documents are not fully OCRd, returning OCR_ERR_INTERNAL.
scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

Thanks, Walter!

FYI, when incorporating this into our project, I got linker errors on compile (unresolved externals for all of the OCR_<whatever> functions).

When I replaced ocrtools.lib in the Examples/lib directory with the live version, those unresolved externals errors went away. I didn't notice anything weird about the one that I replaced with the live lib, but I wanted you guys to be aware of that issue. It doesn't affect me, since I have the live version, but it might affect someone who doesn't.

Then again, it's late, and I'm tired, so I might be hallucinating the entire thing :).

Shaun
scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

I've discovered the issue with the 10007 issue that I was having. It turned out to be an invalid OCRLanguages directory. It's a long story as to how I ended up having a bad directory in there without knowing it, and how the problem magically went away when I rebooted, but I am very confident that was the issue.

The only thing that could have helped me was a more specific error message in that condition.

Thanks!

Shaun
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

Glad it was resolved. We will go through at some point in the near future and add some more specific error codes to help with troubleshooting these kinds of issues.
NBachus
User
Posts: 31
Joined: Tue Oct 26, 2010 2:40 pm

Re: PDF library crash

Post by NBachus »

We were needing to re-visit this issue. When updating the ocrtools.dll to the latest version (1.10.2), we are still receiving this error on machines when trying to OCR a PDF file. The PDF file rather small, in it's only eight pages and 2mb in size. When OCRing using the dll, it responds with the 10007 error. I have downloaded the latest Viewer, and OCR'ed the same document successfully after leaving it alone for 10min. How can we proceed?

Thanks,
Nathan
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

Hi Nathan, the latest version of the DLL is 1.0.13.x - please make sure you are using this version.

It seems likely that there was some error with the language directory and/or language setting in your code - can you confirm that you are placing the language file (e.g. "eng_pxvocr.dat" for english) in the correct directory, as specified in your code (remember to add the additional directory "ocrdats" - e.g. if you specify "c:\langs", put "eng_pxvocr.dat" in "c:\langs\ocrdats").
NBachus
User
Posts: 31
Joined: Tue Oct 26, 2010 2:40 pm

Re: PDF library crash

Post by NBachus »

Walter,

You are correct! Once I created the path we were referencing and added in the dat and lng files, the conversion was completed. I viewed the file, and it looks like the file was corrupted in conversion. Reviewing and re-trying.

-Nathan
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

If you have trouble with unexpected results (when OCR appears to finish), you can attach the input file and output file here (or forward to us at support@pdf-xchange.com if you don't want to post it on the forums) and we will check them to try to determine what is happening.
NBachus
User
Posts: 31
Joined: Tue Oct 26, 2010 2:40 pm

Re: PDF library crash

Post by NBachus »

Walter,

I replaced the dat files with some other we had been using, and it worked. What measures can we take to improve performance? We are seeing different results all over the board for the same conversion on different machines. I am only able to find 1.10.2 as downloadable on the site. Is there a link you can provide for 1.0.13? Also, would a newer dat/language files improve performance?

Thanks,
Nathan
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

You're right - there is a technical issue with our website that is causing the wrong version to be retrieved. We are working on resolving it.

Performance depends highly on the input (size and complexity), but some measures you can take would be to reduce the resolution to 300 or 150 DPI by setting the raster_dpi parameter in the PXO_Options struct that you pass to OCR_MakeSearchable(). You can also set the flag OCR_Image_FastAutorotate (for fast deskew mode) or OCR_Image_NoRotate (if deskew is not needed for your document)... in PXO_Options::ImageFlags.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

Our web server administrators have sorted out the glitch and it is available now.

-Walter
NBachus
User
Posts: 31
Joined: Tue Oct 26, 2010 2:40 pm

Re: PDF library crash

Post by NBachus »

I have sent a document in e-mail that proceeds to crash when attempting to OCR on the latest ocrtools.dll (32bit).
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

Many thanks
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

Can you give some details about the nature of the crash?

What DPI were you setting for OCR? What error or exception was returned?

-Walter
NBachus
User
Posts: 31
Joined: Tue Oct 26, 2010 2:40 pm

Re: PDF library crash

Post by NBachus »

We are using 150dpi and the crash error is in a dialog box "Microsoft Visual C++ Runtime Library" - abnormal program termination.
scdawson
User
Posts: 43
Joined: Thu Oct 20, 2011 3:40 pm

Re: PDF library crash

Post by scdawson »

Here is a little bit more information about where the crash is occurring. Looks like an assert is failing:

Code: Select all

bb_it.data()->owner() == this:Error:Assert failed:in file ..\textord\colpartition.cpp, line 205
Shaun
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

Thanks, will investigate.
Walter-Tracker Supp
User
Posts: 381
Joined: Mon Jun 13, 2011 5:10 pm

Re: PDF library crash

Post by Walter-Tracker Supp »

This comes from a bug in the OCR engine which was throwing an unhandled exception.

For the moment the best solution is for us to handle the exception internally and return an error code so you can handle the failure gracefully. I will provide a new build for download on the website shortly (version 1.0.14).

The problem is not present in the updated OCR engine used in our editor but I will have to discuss whether or not we can implement the fix, and what the timeline would be, for the current SDK.
Post Reply