in our application we need to have the whole word list with their quads positional information.
In the old Viewer ActiveX SDK (with Delphi source), we did something like this to get words by index
Code: Select all
GetProperty('Documents[#'+IntToStr(aDocID)+'].Pages['+IntToStr(page)+'].Text.Words['+IntToStr(wordIdx)+'].String', vDataOut, 0);
GetProperty('Documents[#'+IntToStr(aDocID)+'].Pages['+IntToStr(page)+'].Text.Words['+IntToStr(wordIdx)+'].Offset', vDataOut, 0);
GetProperty('Documents[#'+IntToStr(aDocID)+'].Pages['+IntToStr(page)+'].Text.Words['+IntToStr(wordIdx)+'].Length', vDataOut, 0);
GetProperty('Documents[#'+IntToStr(aDocID)+'].Pages['+IntToStr(page)+'].Text.Words['+IntToStr(wordIdx)+'].Quads.Value', vDataOut, 0);
I can't find similar functions in Editor SDK. I read some other answers pointing out different solutions, for example inspecting IPXC_PageText and relative sub-structures to get positional information of the words and then get the chars with GetChars method.
This is an unfeasible solution for us because we need to pre-analyze the whole PDF document to speed up the research in real-time of words under cursor, finding an entire sentence from a position and so on.
Any suggestions? Thanks in advance.
Fabrizio