This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.
I have written a function to see if there is any text in a PDF document page. The code is below.
However, when I call this function a second time I get an AV on the line that calls PXCp_ET_AnalyzePageContent. Again, the first time I call the function it's fine. Is there something I'm not cleaning up correctly in my function.
Example:
HasAnyText(1,1);
HasAnyText(2,1); //Give an exception at PXCp_ET_AnalyzePageContent
function TPDFLibEx.HasAnyText(APage, APagesToCheck: Integer): Boolean;
var
hr: HRESULT;
i, AStart, AEnd: Integer;
ACount: DWORD;
ATextElement: PXP_TextElement;
begin
Result := False;
if (APage > 0) then
begin
AStart := APage;
AEnd := APage;
end else
begin
AStart := 1;
if (APagesToCheck <= 0) then APagesToCheck := Get_PageCount;
AEnd := Min(APagesToCheck, Get_PageCount);
end;
PXCp_ET_Prepare(FDocID);
try
for APage := AStart to AEnd do
begin
ACount := 0;
PXCp_ET_AnalyzePageContent(FDocID, APage - 1); //Error here when called second time
PXCp_ET_GetElementCount(FDocID, @ACount);
if (ACount > 0) then
begin
Result := True;
break;
end;
end;
finally
PXCp_ET_Finish(FDocID);
end;
end;
Reproduced with your file. Actually this crash may happen with many other files too - the problem was in not fully correctly handling subsequent calls to PXCp_ET_Prepare - now it is fixed. Fix will be available with the next build.
And here are some notes about your code:
1. It is unsafe. You never check results, returned from our functions. It is not a problem when everything is OK, but if a function fails for any reason subsequent calls may result in a crash (should not, but ...).
2. Functions PXCp_ET_Prepare and PXCp_ET_Finish should be used in pair, and there is no need to call PXCp_ET_Prepare for each page and there must be a call to PXCp_ET_Finish after finishing text extraction. More, PXCp_ET_Prepare is very time consuming operation (on large documents), so it is not recommended to call it too often. Work with text should look similar to the following: PXCp_ET_Prepare
process as many pages as you need, but DO NOT MODIFY document PXCp_ET_Finish
do anything else
HTH
Victor
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
By the way, there has been chat on the forum about putting all the SDK DLL into one at some point (ie., Viewer, Lib, Lib Pro). Do you know if a decision to move in that direction has been made. At times, I am forced to open a file in one SDK and do some items, only to have to close it and re-open it in another SDK to do some other items.
I've never gotten back a definitive answer and to if and when?
The Viewer functions will continue to be expanded and to some extent rely on the functionality built in to other elements of our libraries - but will not be merged - however the PDf-Tools SDK libraries will be as far as possible.
So functions in the pxclib40/xcpro40 dll's will merge to become one - to make where possible common functions for new and existing files much easier to share, maintain and pass info to/from ...
HTH
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.
Right now the only call I have to make in the Viewer DLL is PXCV_DrawPageToDC in order to get an image of the page for OCR purposes. I prefer this over image extraction since often the page contains multiple images that have to be put back together.
For everything else I use the pxclib40/xcpro40 dll's to create an PDF with ocr text and then merge it back to the existing file using PlaceContents.
I was looking for a way to open the file only once, but it looks like this won't be possible if the Viewer DLL won't be working with the pxclib40/xcpro40 dll's.
Is there any other way to get a bmp image of a PDF page without extracting the image and using the pxclib40/xcpro40 dll's?
No - the libraries require that you extract the page as an image, assuming it is an image based PDF (if it is text based - the library dll's cannot help at all to convert to an image - they have no such functionaity) - whereas the Viewer allows you alternatives as it would appear you have already found.
So I am afraid for now at least that is the only option we have to offer.
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.