detecting blanks between textchars

This Forum is for the use of Software Developers requiring help and assistance for Tracker Software's PDF-Tools SDK of Library DLL functions(only) - Please use the PDF-XChange Drivers API SDK Forum for assistance with all PDF Print Driver related topics or PDF-XChange Viewer SDK if appropriate.

Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan

uko
User
Posts: 80
Joined: Fri Dec 14, 2007 2:40 pm

detecting blanks between textchars

Post by uko »

Hi,

using the PXCp_ET_... methods I'm able to extract all textelements that are positioned inside a given area. Now I want to copy the found text to clipboard, preserving as much formating as possible (line breaks and blanks). While I can do this by simply merging all the characters of the found textelements to a string, this procedure will loose any information about blanks between characters. Now I'm looking for a way to detect if between two characters of a textelement there is a blank.
My idea is to compare the two characters offset and if it is larger then the 'size' of a blank, insert a blank between them on my result string. But how to get the size of a blank? Is there a better way as starting with textelements' fontinfo and using the low-level api to process all the font objects to get the width?

Or in general: is there a good algorithm to detect if there is a blank between two characters when only their position is known?

I tried it already with computing the avarage distance between two textelement characters (for each different font) and if the distance between two chars is larger then this average distance multiplied by a factor then insert a blank. But this gives too much false blanks and also some missing one.



kind regards,
Ulrich
User avatar
Lzcat - Tracker Supp
Site Admin
Posts: 677
Joined: Thu Jun 28, 2007 8:42 am

Re: detecting blanks between textchars

Post by Lzcat - Tracker Supp »

Hi.
In general fonts in PDF may not contain information about space character width - because some PDF creators do not use space character at all, and may not include any information about it. And I'm affraid there is no common algorithm how to detect spaces, just some sort of approximations. PXCp_ET_... functions cannot provide all information you need, but you may try to collect it using low-level API, when it is possible. You will need to read section 5 (Text), in the PDF Reference, especially subsections 5.5 and 5.6. But do not expect that solution will be easy and complete.

You can download the PDF Reference manual from http://www.adobe.com (freely)

HTH
Victor
Tracker Software
Project manager

Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
uko
User
Posts: 80
Joined: Fri Dec 14, 2007 2:40 pm

Re: detecting blanks between textchars

Post by uko »

Victor,

thanks for explaining. Look like I have to dig into fonts :-)

kind regards,
Ulrich