bug in text extraction with PXCp_ET_GetElement?

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

uko
User
Posts: 80
Joined: Fri Dec 14, 2007 2:40 pm

bug in text extraction with PXCp_ET_GetElement?

Post by uko »

Hi,

I'm using Build 160 of Pro SDK for text extraction by PXCp_ET_GetElement. In general it works fine but now I got a pdf that gives strange results:
the text element has equal offsets for more then one characters:
characters: H,i,l,b,e,r,t,s,p,a,c,e,#0
Offset: 0,0,0,0,0.296,0.296,0.296,3.36,3.36,3.36,3.36,3.36,3.36

This is the relating part of the PDF: TD[(Hilb)-27(ert)-311(space)]

So is this a bug? And if not: how can I determine the position of the third character (l)?
Attached you find a sample page with this problem (look for first appearance of 'Hilbert')

Here's the code I'm using to get a text element:

Code: Select all

var
  hr: HResult;
begin
  Result := False;

  // Speicherbedarf Textelement bestimmen
  ATextElement.cbSize := SizeOf(PXP_TextElement);
  ATextElement.Count := 0;
  ATextElement.mask := 0;
  hr := PXCp_ET_GetElement(FDocument, AIndex, @ATextElement, 0);
  if ((not IS_DS_FAILED(hr)) and (ATextElement.Count > 0)) then
  begin
    ATextElement.mask := PTEM_Text + PTEM_Offsets + PTEM_Matrix +
                         PTEM_FontInfo + PTEM_TextParams;

    ATextElement.Characters := nil;
    ATextElement.Offsets := nil;
    SetLength(ATextElement.Characters, ATextElement.Count);
    SetLength(ATextElement.Offsets, ATextElement.Count);

    // Textelement auslesen
    if AIgnorePageRotation then
      hr := PXCp_ET_GetElement(FDocument, AIndex, @ATextElement, GTEF_IgnorePageRotation)
    else
      hr := PXCp_ET_GetElement(FDocument, AIndex, @ATextElement, 0);

    Result := not IS_DS_FAILED(hr);
  end;
end;

best regards,
Ulrich
You do not have the required permissions to view the files attached to this post.
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: bug in text extraction with PXCp_ET_GetElement?

Post by Lzcat - Tracker Supp »

Yes, there was bug in xcpro40 with handling font using double values as character widths. It is fixed now, but I'm affraid that fix will be avail only with build 162.
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
uko
User
Posts: 80
Joined: Fri Dec 14, 2007 2:40 pm

Re: bug in text extraction with PXCp_ET_GetElement?

Post by uko »

Thanks Victor,

can you give a rough estimation, when Build 162 will be available?


best regards,
Ulrich
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom

Re: bug in text extraction with PXCp_ET_GetElement?

Post by John - Tracker Supp »

If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com
uko
User
Posts: 80
Joined: Fri Dec 14, 2007 2:40 pm

Re: bug in text extraction with PXCp_ET_GetElement?

Post by uko »

Thank you very much!

best regards,
Ulrich
User avatar
John - Tracker Supp
Site Admin
Posts: 5219
Joined: Tue Jun 29, 2004 10:34 am
Location: United Kingdom

Re: bug in text extraction with PXCp_ET_GetElement?

Post by John - Tracker Supp »

Pleasure ;)
If posting files to this forum - you must archive the files to a ZIP, RAR or 7z file or they will not be uploaded - thank you.

Best regards
Tracker Support
http://www.tracker-software.com