Söndagen den 5 mars 2006 kl 21.30 skrev Frank Mittelbach: > > that is not to say that the line > >>>> \DeclareUnicodeCharacter{02C6}{\textasciicircum} > > is probably wrong it should be most likely > > \DeclareUnicodeCharacter{005E}{\textasciicircum} > > and several others have similar defects. would be good if that got > checked. Is that even a legal definition? U+005E (^) is, as was mentioned earlier in this thread, syntax in LaTeX, so you can't inputenc map it to something. Or are you thinking about some attempt at supporting verbatim input? >> Example: Assuming there is a word "deaffish" and the >> author does not want a ligature ffi spanning both word parts. >> Therefore, having a good editor, he uses the Unicode sequence >> U+0066 U+FB01 to specify the correct and desired ligature. >> Using the later case of \DeclareUnicodeCharacter{FB01} >> TeX would get "ffi" and then form the wrong ligature. > > wrong example in my opinion. as Lars said: fi or ffi ligature ended up > in > unicode as legacy codes because they were in legacy 8-bit encodings. > million > other ligatures are not available as "chars" because UC like most other > standards are heavily influenced by what is right for certain > countries but > not others. using "fi" in this way is like using tables in html to > position > elements on the page, ie it works for that example but ... > > so the right thing is not to use fi at all here but would be to a > generic > method to denote subword boundaries or whatever to allow the formatter > not to > use the ligature. TeX's method would be \textcompwordmark ... but > unicode > never thought that such encoding of lgoical information is the task of > the > standard. Actually, U+200C (ZERO WIDTH NON-JOINER) seems to me a perfect match to \textcompwordmark, and I've entered it as such in my "Draft specification for the T1 encoding". More pragmatically, one may of course write "deaf\-fish" to not only escape the ligature, but also point out the proper point of hyphenation. Lars Hellström