Spaces and CR/LF symbols are not included when I use DLLs to extract text, for example with the (PXCp_ET_GetPageContentAsTextW) function.
Problem:
When I use the DLLs to extract text, for example (PXCp_ET_GetPageContentAsTextW), spaces and CR/LF symbols are not included. Is there an argument that achieves these goals?
Resolution:
There are two methods to resolve this issue:
A) Fill the PXP_TETextComposeOptions structure - specifically, the AddSpaces parameter. See here for further information.
B) Use the PXCp_ET_GetElementCount and PXCp_ET_GetElement functions to get text with the position and compose it yourself. This requires the implementation of a text composition algorithm, but the end results are greatly improved and text can then be extracted with spaces intact.
More Like This
-
KB#72: How do Icopy an image from an existing PDF file to a new PDF file and retain the original dimensions?
-
KB#227: I have an issue using the PDF-XChange SDK sample code with Microsoft Visual Basic in 'Interactive Mode' for debugging.
-
KB#50: Error when running application scheduled task
-
KB#51: I have a problem with my web application making calls to PXCLIB40.dll when running in 64 bit development environment
-
KB#118: Can I use the PDF-XChange print driver from Delphi 5 to save a Quick Report as a PDF file without user intervention?