Page 1 of 1

Ligatures

Posted: Sat Jun 21, 2008 1:07 pm
by wmm
A ligature is a typographical entity where two or three separate characters are connected into one. The most common English example is the sequence "fi", where the top of the "f" is extended to merge with the dot of the "i".

I regularly work with a large document produced by LaTeX with numerous ligatures. Adobe reader handles them transparently. For example, if I search for "first", Adobe reader finds all the occurrences, even if the "fi" is a ligature, and if I copy and paste text containing a ligature, the pasted text has the separate characters.

PDF-Xchange Viewer, on the other hand, treats the ligatures as distinct single characters. If I search for "first", I find only instances where the the ligature does not occur (in text set in a monospaced font, for instance, or with an initial capital), and copying and pasting such text inserts either the ligature or an obscure escape sequence, depending on the application into which I'm pasting.

The copy/paste issue isn't very significant; I can work around that fairly easily. The failure of search, however, is a major concern and could prevent me from using PDF-Xchange Viewer as my principal PDF reader. Can this be fixed in a forthcoming release?

(If you need an example, here's a version of the document:
http://www.open-std.org/jtc1/sc22/wg21/ ... /n2606.pdf
)

Re: Ligatures

Posted: Sat Jun 21, 2008 3:11 pm
by Bhikkhu Pesala
Foxit Reader suffers from the same problem, but there is a simple work around — search for "first".

Re: Ligatures

Posted: Sat Jun 21, 2008 3:52 pm
by wmm
Thanks for the suggestion, but unless I misunderstood what you were saying it doesn't work. I assume that what you meant was that I should enclose the string in quotes. I did that; the "find" command finds nothing, and the "search" command finds only the occurrences without the "fi" ligature.

Re: Ligatures

Posted: Sat Jun 21, 2008 4:21 pm
by Bhikkhu Pesala
Yes. I mean use the string in quotes, which uses the fi ligature from Alphabetic Presentation forms.

Find next from the toolbar finds the next occurence. The Search command finds 382 entries. I am using the latest version. Check for updates if you're not.

Re: Ligatures

Posted: Sat Jun 21, 2008 6:04 pm
by wmm
Bhikkhu Pesala wrote:Yes. I mean use the string in quotes, which uses the fi ligature from Alphabetic Presentation forms.
I'm not sure exactly what you're saying here. What I did was to open the full search pane, type a double-quote followed by the five characters f-i-r-s-t followed by another double-quote, and then hit Search Now. That resulted in 1015 entries, all of them in monospaced font or with initial capital. (The "find" command, with that string, finds nothing.) What did you mean by "Alphabetic Presentation" forms?
Bhikkhu Pesala wrote:Find next from the toolbar finds the next occurence. The Search command finds 382 entries. I am using the latest version. Check for updates if you're not.
If I copy and paste the word "first" from an occurrence containing the ligature and search for that (with or without quotes), I get 382 entries -- only occurrences with the ligature, none of the ones in the list of monospaced or initial-capital forms.

Adobe reader, when I search for first (typing all five characters) finds (very slowly!) 1385 entries, including both with and without the ligature. (I don't know why that's 12 fewer than the union of the results from the PDF-Xchange Viewer searches.)

(I'm using the latest version, too (2.0 build 38).)

Re: Ligatures

Posted: Sat Jun 21, 2008 7:03 pm
by Bhikkhu Pesala
Build 38 is not even announced yet — it must be very new. I just updated.

The ff, fi, fl, ffi, ffl, ligatures are in the Unicode character set called Alphabetic Presentation Forms.

Your document contains a mixture of "fi" ligatures (382) and "f i" as two separate characters (1015 occurences). The monospaced font uses "f i" while the proportional font uses the ligatures. However, no ligatures are used for example in the word "effect" which occurs many times.

Adobe Reader 7.1 finds the total less 12 (1385 occurences) almost instantly here. I'm not sure why it is missing those 12. It also finds 1385 occurences if I search for "first" with the ligature, i.e. it ignores the distinction, which is less useful in my opinion, though I can see the other POV too. Joe Blogs doesn't care whether ligatures are used or not — he just wants to find what he's looking for.

If you use an OpenType font, the text string will be separate letters f·i·r·s·t but any f·i pairs will be replaced with ligatures. That's why spell-check doesn't fail when using OpenType fonts, but it does if you're inserting the Alphabetic Presentation Forms into your document.

Re: Ligatures

Posted: Sat Jun 21, 2008 8:03 pm
by wmm
Bhikkhu Pesala wrote:Build 38 is not even announced yet — it must be very new. I just updated.
Yes. I just downloaded and started using PDF-Xchange Viewer yesterday for the first time, so when you mentioned that you were using the latest version, I figured that I had it, too. Just to make sure, though, I checked for updates, and surprisingly there was a new one. It was built last night, according to the "about" blurb.
Bhikkhu Pesala wrote:The ff, fi, fl, ffi, ffl, ligatures are in the Unicode character set called Alphabetic Presentation Forms.
Ah, thanks.
Bhikkhu Pesala wrote:Your document contains a mixture of "fi" ligatures (382) and "f i" as two separate characters (1015 occurences). The monospaced font uses "f i" while the proportional font uses the ligatures. However, no ligatures are used for example in the word "effect" which occurs many times.

Adobe Reader 7.1 finds the total less 12 (1385 occurences) almost instantly here. I'm not sure why it is missing those 12. It also finds 1385 occurences if I search for "first" with the ligature, i.e. it ignores the distinction, which is less useful in my opinion, though I can see the other POV too. Joe Blogs doesn't care whether ligatures are used or not — he just wants to find what he's looking for.
Just call me "Joe," then :wink: -- I need to reliably find all the places a given term is used (not "first," obviously -- that was just an easy example), from a string I type in.

(Adobe Reader is finding the results "almost instantly" now here, too; I guess it built an index the first time. That time, though, it took several times longer than PDF-Xchange Viewer's search, maybe 10-12 seconds.)
Bhikkhu Pesala wrote:If you use an OpenType font, the text string will be separate letters f·i·r·s·t but any f·i pairs will be replaced with ligatures. That's why spell-check doesn't fail when using OpenType fonts, but it does if you're inserting the Alphabetic Presentation Forms into your document.
Unfortunately, I'm not the author of the document, just a consumer, so I don't control how it's produced. I don't know why "fi" ligatures are used but "ff" ones are not, for instance, and I can't choose which fonts are used.

Thanks for the background. Hopefully the developers will be able to do something relatively soon to handle ligatures more usefully. It's possible to work around the problems using the "advanced search" capabilities, but it's a pain, and I'm afraid I'm going to be misled if I don't happen to notice that what I'm searching for contains a ligature. (I've been working with versions of this document for years with Adobe Reader and never had to worry about ligatures before.)

Re: Ligatures

Posted: Sun Jun 22, 2008 12:07 pm
by Ivan - Tracker Software
Support for ligatures will be added in an upcoming build together with improving the text editor for supporting east asian languages, etc.

HTH

Re: Ligatures

Posted: Sun Jun 22, 2008 12:56 pm
by wmm
Ivan - Tracker Software wrote:Supporting of ligatures will be added into one of the next build together with improving text editor for supporting east asian languages, etc.
That's great! Thanks so much.

Re: Ligatures

Posted: Sun Jun 22, 2008 1:27 pm
by quant
Hi,

just want to add support for this, I didn't know before what was going on ...
I often search for "finance" or sth like that and then realized that many of them were missed, so instead I had to search for "nance".

Thanks

Re: Ligatures

Posted: Tue Jun 24, 2008 4:20 am
by Bhikkhu Pesala
I'm glad to hear that this will be fixed sometime.

I suggest that a search for a string containing regular text should find words with regular text and words with ligatures, but a search for a string containing ligatures should find only ligatures.

Re: Ligatures

Posted: Tue Jun 24, 2008 10:12 am
by wmm
Bhikkhu Pesala wrote:I'm glad to hear that this will be fixed sometime.

I suggest that a search for a string containing regular text should find words with regular text and words with ligatures, but a search for a string containing ligatures should find only ligatures.
I understand the functionality reason for wanting search to work that way, but I think it would be very confusing to people if searching for a string they copied and pasted found fewer occurrences than if they typed the same string. If they weren't aware of the existence of a ligature in the copied string, it would just seem like a bug.

If this functionality is to be provided, I don't think it should be by default; it should either have its own option or at least be tied to another "exact search" option (like "match case" or "whole words only").

Re: Ligatures

Posted: Tue Jun 24, 2008 4:44 pm
by Bhikkhu Pesala
I think you're right. Adobe's way of doing it is probably best for most users.

Re: Ligatures

Posted: Tue Aug 04, 2009 6:37 pm
by Bhikkhu Pesala
This issue still affects the latest build. I think it is quite a signficant issue that needs to be fixed sooner rather than later. I have no problem finding words like "effort" that include ligatures if I use Adobe Reader 7, but I cannot find them using PDF-XChange Viewer, unless I type "effort" in the Find toolbar (using the Alphabetical Presentation Form, or ligature).

Re: Ligatures

Posted: Tue Aug 04, 2009 9:03 pm
by wmm
Yes, it's been well over a year now since Ivan assured us that support would be "added into one of the next build." This is a really significant handicap, and a fix would be very much appreciated.

Re: Ligatures

Posted: Tue Aug 04, 2009 9:12 pm
by Chris - Tracker Supp
Hi wmm,

I just wanted to comment that our development team is hard at work every day usually putting in more hours than the average bear and new feature requests are being heard, worked on and added all the time, alot of times having to be prioritized or worked on in groups of related functionality especially from a programming perspective. Ivan has stated above that it will be included when they get to address more advanced text editing core functionalities and right to left language support and the like. I assure you that it will be addressed and I think a little patience and appreciation for the work these guys actually do is important please understand it be realized as Ivan as stated.

Regards,
Chris

Re: Ligatures

Posted: Tue Aug 04, 2009 9:29 pm
by wmm
Yes, I wasn't intending to cast aspersions -- I'm in software development myself, and I understand that things have to be done in priority order and that what's important to me individually may not be what's needed by the user community at large. I was just expressing some disappointment that the feature turned out not to be as imminent as the earlier, very encouraging response had led me to believe. I'm pleased to hear that it's still on the roadmap. Thanks for the clarification.

Re: Ligatures

Posted: Tue Aug 04, 2009 10:57 pm
by Chris - Tracker Supp
Not a problem wmm,

And we understand your comment as well it's always a juggling game. And we appreciate your patience and will do our best to incorporate this feature as soon as we can.

Regards,

Chris

Re: Ligatures

Posted: Wed Sep 30, 2009 7:24 am
by Bhikkhu Pesala
Though this may not be a bug, users accustomed to the behaviour in Adobe Reader will regard it as a bug since searching for words like "first" or "difficult" won't find them if Alphabetic Presentation Forms were used.

Re: Ligatures

Posted: Wed Sep 30, 2009 1:56 pm
by Tracker Supp-Stefan
Agreed Bhikkhu,
that someone used to those Alphabetic Forms will count this as a bug :)
Will check with Ivan for any more precise plans when this might get implemented.

Best regards,
Stefan

Re: Ligatures

Posted: Wed Dec 30, 2009 6:30 am
by Bhikkhu Pesala
Fixed in build 2.0.0043.0 :)

Re: Ligatures

Posted: Wed Dec 30, 2009 8:12 am
by Cadillakin
Bhikkhu Pesala wrote:Fixed in build 2.0.0043.0 :)
Not fixed.

The file I included as an attachment in this thread; https://forum.pdf-xchange.com/ ... =35&t=7614 still cannot be searched properly for "traffic."

Re: Ligatures

Posted: Wed Dec 30, 2009 8:56 am
by Bhikkhu Pesala
The bug is fixed. There is something else wrong with that document. Try searching for "Trafic" using Adobe Reader.

Re: Ligatures

Posted: Wed Dec 30, 2009 12:31 pm
by Vasyl-Tracker Dev Team
Hi guys,

Here is a misunderstanding: the search in the attached document is not proper really.
I tried to search the "traffic." (with dot-symbol on the end):
in Adobe: 4 instances found, in PDF-XChange: 3 instances.

We will investigate this trouble.

Thanks.

Re: Ligatures

Posted: Wed Dec 30, 2009 12:33 pm
by wmm
Wonderful! Thank you so much! Now PDF-Xchange Viewer is perfect! :D

Re: Ligatures

Posted: Mon Jan 04, 2010 1:57 pm
by halabund
There's still room for improvement in this area though. Adobe Reader can ignore accents on letters (e.g. matches á when searching for a, or ά for α, etc.), even when the accent is a separate glyph in the document, can do stemming to a certain level, etc. Stemming is not a big deal for me, but the ability to ignore accents is quite useful.

On the other hand, XChange viewer searches noticeably faster.

Re: Ligatures

Posted: Tue Jan 05, 2010 2:27 pm
by Vasyl-Tracker Dev Team
Hi,

We will try to add the option for ignoring accents on letters into the new version(V3).

Best
Regards.