PXCp_ET_AnalyzePageContent Issue

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

Post Reply
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

I have written a function to see if there is any text in a PDF document page. The code is below.

However, when I call this function a second time I get an AV on the line that calls PXCp_ET_AnalyzePageContent. Again, the first time I call the function it's fine. Is there something I'm not cleaning up correctly in my function.

Example:

HasAnyText(1,1);
HasAnyText(2,1); //Give an exception at PXCp_ET_AnalyzePageContent

Code: Select all

function TPDFLibEx.HasAnyText(APage, APagesToCheck: Integer): Boolean;
var
  hr: HRESULT;
  i, AStart, AEnd: Integer;
  ACount: DWORD;
  ATextElement: PXP_TextElement;
begin
  Result := False;
  if (APage > 0) then
  begin
    AStart := APage;
    AEnd := APage;
  end else
  begin
    AStart := 1;
    if (APagesToCheck <= 0) then APagesToCheck := Get_PageCount;
    AEnd := Min(APagesToCheck, Get_PageCount);
  end;

  PXCp_ET_Prepare(FDocID);
  try
    for APage := AStart to AEnd do
    begin
      ACount := 0;
      PXCp_ET_AnalyzePageContent(FDocID, APage - 1); //Error here when called second time
      PXCp_ET_GetElementCount(FDocID, @ACount);
      if (ACount > 0) then
      begin
        Result := True;
        break;
      end;
    end;
  finally
    PXCp_ET_Finish(FDocID);
  end;
end;
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

I ran this on some other PDF docs and it doesn't happen on all of them. Here's one where the AV occurs as described in my post.
Attachments
OCRText.zip
(186.87 KiB) Downloaded 422 times
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: PXCp_ET_AnalyzePageContent Issue

Post by Lzcat - Tracker Supp »

Reproduced with your file. Actually this crash may happen with many other files too - the problem was in not fully correctly handling subsequent calls to PXCp_ET_Prepare - now it is fixed. Fix will be available with the next build.
And here are some notes about your code:
1. It is unsafe. You never check results, returned from our functions. It is not a problem when everything is OK, but if a function fails for any reason subsequent calls may result in a crash (should not, but ...).
2. Functions PXCp_ET_Prepare and PXCp_ET_Finish should be used in pair, and there is no need to call PXCp_ET_Prepare for each page and there must be a call to PXCp_ET_Finish after finishing text extraction. More, PXCp_ET_Prepare is very time consuming operation (on large documents), so it is not recommended to call it too often. Work with text should look similar to the following:
PXCp_ET_Prepare
process as many pages as you need, but DO NOT MODIFY document
PXCp_ET_Finish
do anything else
HTH
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

Thanks for the tips. I'll redo my code.

When do you expect to release the next update with this fix?
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: PXCp_ET_AnalyzePageContent Issue

Post by Tracker Supp-Stefan »

The Build was released tonight (EU time - morning for the US).

Cheers,
Stefan
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

Thanks.

Also, is there a URL that details all the changes and fixes made to the current release.
User avatar
Tracker Supp-Stefan
Site Admin
Posts: 17824
Joined: Mon Jan 12, 2009 8:07 am
Location: London
Contact:

Re: PXCp_ET_AnalyzePageContent Issue

Post by Tracker Supp-Stefan »

hi jeffp,

There are for the end user products but I am afraid that for the moment there is no version history for the SDKs :(

However - they do serve as a valid indicator of what has been corrected in all products for the most part.

Regards,
Stefan
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

Thanks.

By the way, there has been chat on the forum about putting all the SDK DLL into one at some point (ie., Viewer, Lib, Lib Pro). Do you know if a decision to move in that direction has been made. At times, I am forced to open a file in one SDK and do some items, only to have to close it and re-open it in another SDK to do some other items.

I've never gotten back a definitive answer and to if and when?
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: PXCp_ET_AnalyzePageContent Issue

Post by John - Tracker Supp »

The Viewer functions will continue to be expanded and to some extent rely on the functionality built in to other elements of our libraries - but will not be merged - however the PDf-Tools SDK libraries will be as far as possible.

So functions in the pxclib40/xcpro40 dll's will merge to become one - to make where possible common functions for new and existing files much easier to share, maintain and pass info to/from ...

HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

Thanks John.

Right now the only call I have to make in the Viewer DLL is PXCV_DrawPageToDC in order to get an image of the page for OCR purposes. I prefer this over image extraction since often the page contains multiple images that have to be put back together.

For everything else I use the pxclib40/xcpro40 dll's to create an PDF with ocr text and then merge it back to the existing file using PlaceContents.

I was looking for a way to open the file only once, but it looks like this won't be possible if the Viewer DLL won't be working with the pxclib40/xcpro40 dll's.

Is there any other way to get a bmp image of a PDF page without extracting the image and using the pxclib40/xcpro40 dll's?
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: PXCp_ET_AnalyzePageContent Issue

Post by John - Tracker Supp »

Hi Jeff,

No - the libraries require that you extract the page as an image, assuming it is an image based PDF (if it is text based - the library dll's cannot help at all to convert to an image - they have no such functionaity) - whereas the Viewer allows you alternatives as it would appear you have already found.

So I am afraid for now at least that is the only option we have to offer.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
jeffp
User
Posts: 914
Joined: Wed Sep 30, 2009 6:53 pm

Re: PXCp_ET_AnalyzePageContent Issue

Post by jeffp »

No worries. It's only a slight issue on big documents.

Your libraries are fabulous, so don't take anything I say as complaining, just exploring possibilities.
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom
Contact:

Re: PXCp_ET_AnalyzePageContent Issue

Post by John - Tracker Supp »

Thanks Jeff - appreciate the kind words :)
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
Post Reply