Hi,
using the PXCp_ET_... methods I'm able to extract all textelements that are positioned inside a given area. Now I want to copy the found text to clipboard, preserving as much formating as possible (line breaks and blanks). While I can do this by simply merging all the characters of the found textelements to a string, this procedure will loose any information about blanks between characters. Now I'm looking for a way to detect if between two characters of a textelement there is a blank.
My idea is to compare the two characters offset and if it is larger then the 'size' of a blank, insert a blank between them on my result string. But how to get the size of a blank? Is there a better way as starting with textelements' fontinfo and using the low-level api to process all the font objects to get the width?
Or in general: is there a good algorithm to detect if there is a blank between two characters when only their position is known?
I tried it already with computing the avarage distance between two textelement characters (for each different font) and if the distance between two chars is larger then this average distance multiplied by a factor then insert a blank. But this gives too much false blanks and also some missing one.
kind regards,
Ulrich
detecting blanks between textchars
Moderators: TrackerSupp-Daniel, Tracker Support, Vasyl-Tracker Dev Team, Chris - Tracker Supp, Sean - Tracker, Tracker Supp-Stefan
-
- Site Admin
- Posts: 677
- Joined: Thu Jun 28, 2007 8:42 am
Re: detecting blanks between textchars
Hi.
In general fonts in PDF may not contain information about space character width - because some PDF creators do not use space character at all, and may not include any information about it. And I'm affraid there is no common algorithm how to detect spaces, just some sort of approximations. PXCp_ET_... functions cannot provide all information you need, but you may try to collect it using low-level API, when it is possible. You will need to read section 5 (Text), in the PDF Reference, especially subsections 5.5 and 5.6. But do not expect that solution will be easy and complete.
You can download the PDF Reference manual from http://www.adobe.com (freely)
HTH
In general fonts in PDF may not contain information about space character width - because some PDF creators do not use space character at all, and may not include any information about it. And I'm affraid there is no common algorithm how to detect spaces, just some sort of approximations. PXCp_ET_... functions cannot provide all information you need, but you may try to collect it using low-level API, when it is possible. You will need to read section 5 (Text), in the PDF Reference, especially subsections 5.5 and 5.6. But do not expect that solution will be easy and complete.
You can download the PDF Reference manual from http://www.adobe.com (freely)
HTH
Victor
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
Tracker Software
Project manager
Please archive any files posted to a ZIP, 7z or RAR file or they will be removed and not posted.
-
- User
- Posts: 80
- Joined: Fri Dec 14, 2007 2:40 pm
Re: detecting blanks between textchars
Victor,
thanks for explaining. Look like I have to dig into fonts
kind regards,
Ulrich
thanks for explaining. Look like I have to dig into fonts
kind regards,
Ulrich