Knowledgebase

Back to Articles List

Spaces and CR/LF symbols are not included when I use DLLs to extract text, for example with the (PXCp_ET_GetPageContentAsTextW) function.

Problem:

When I use the DLLs to extract text, for example (PXCp_ET_GetPageContentAsTextW), spaces and CR/LF symbols are not included. Is there an argument that achieves these goals?

Resolution:

There are two methods to resolve this issue:

A) Fill the PXP_TETextComposeOptions structure - specifically, the AddSpaces parameter. See here for further information.

B) Use the PXCp_ET_GetElementCount and PXCp_ET_GetElement functions to get text with the position and compose it yourself. This requires the implementation of a text composition algorithm, but the end results are greatly improved and text can then be extracted with spaces intact.

Was this article helpful?
Yes No Somewhat