bug in text extraction with PXCp_ET_GetElement?

uko · Post by **uko** » Wed Apr 29, 2009 1:53 pm

Hi,

I'm using Build 160 of Pro SDK for text extraction by PXCp_ET_GetElement. In general it works fine but now I got a pdf that gives strange results:
the text element has equal offsets for more then one characters:
characters: H,i,l,b,e,r,t,s,p,a,c,e,#0
Offset: 0,0,0,0,0.296,0.296,0.296,3.36,3.36,3.36,3.36,3.36,3.36

This is the relating part of the PDF: TD[(Hilb)-27(ert)-311(space)]

So is this a bug? And if not: how can I determine the position of the third character (l)?
Attached you find a sample page with this problem (look for first appearance of 'Hilbert')

Here's the code I'm using to get a text element:

Code: Select all

var
  hr: HResult;
begin
  Result := False;

  // Speicherbedarf Textelement bestimmen
  ATextElement.cbSize := SizeOf(PXP_TextElement);
  ATextElement.Count := 0;
  ATextElement.mask := 0;
  hr := PXCp_ET_GetElement(FDocument, AIndex, @ATextElement, 0);
  if ((not IS_DS_FAILED(hr)) and (ATextElement.Count > 0)) then
  begin
    ATextElement.mask := PTEM_Text + PTEM_Offsets + PTEM_Matrix +
                         PTEM_FontInfo + PTEM_TextParams;

    ATextElement.Characters := nil;
    ATextElement.Offsets := nil;
    SetLength(ATextElement.Characters, ATextElement.Count);
    SetLength(ATextElement.Offsets, ATextElement.Count);

    // Textelement auslesen
    if AIgnorePageRotation then
      hr := PXCp_ET_GetElement(FDocument, AIndex, @ATextElement, GTEF_IgnorePageRotation)
    else
      hr := PXCp_ET_GetElement(FDocument, AIndex, @ATextElement, 0);

    Result := not IS_DS_FAILED(hr);
  end;
end;

best regards,
Ulrich

Thu Apr 30, 2009 9:07 am

Yes, there was bug in xcpro40 with handling font using double values as character widths. It is fixed now, but I'm affraid that fix will be avail only with build 162.

uko · Post by **uko** » Thu Apr 30, 2009 9:14 am

Thanks Victor,

can you give a rough estimation, when Build 162 will be available?

best regards,
Ulrich

Post by **John - Tracker Supp** » Wed May 06, 2009 9:20 pm

Its out today

https://www.pdf-xchange.com/downloads/dev/

cheers !

uko · Post by **uko** » Thu May 07, 2009 7:06 am

Thank you very much!

best regards,
Ulrich

Post by **John - Tracker Supp** » Thu May 07, 2009 4:30 pm

Pleasure

bug in text extraction with PXCp_ET_GetElement?

bug in text extraction with PXCp_ET_GetElement?

Re: bug in text extraction with PXCp_ET_GetElement?

Re: bug in text extraction with PXCp_ET_GetElement?

Re: bug in text extraction with PXCp_ET_GetElement?

Re: bug in text extraction with PXCp_ET_GetElement?

Re: bug in text extraction with PXCp_ET_GetElement?