Heiko, > I don't, but this is used in next.def, where 0xFE and 0xFF isn't > part of the NextStep encoding: > \DeclareInputText{254}{} > \DeclareInputText{255}{} > Thus actually an empty "LICR" is used here. as i said, mistake by the person contributing that encoding and harmless anyway --- removed by now > > >Thus the entry for U+02C6 in utf8enc.dfu is not really correct: > > > \DeclareUnicodeCharacter{02C6}{\textasciicircum} > > > U+02C6 MODIFIER LETTER CIRCUMFLEX ACCENT > > >"\^" would be more correct, except that grabbing the > > >argument isn't too trivial in case of utf-8 characters > > >consisting of several bytes. > > > > Aren't you thinking of the COMBINING circumflex accent here? > > Yes. > > > MODIFIER characters are more phonetic alphabet thingies. > > Thanks. but the combining don't work in TeX either, as unicode defines them as following the base char an TeX requires them to precede the base char. in short you can't turn unicode with combining chars into TeX/LaTeX code without a preprocess as you can't make base chars act on following input that is not to say that the line > > > \DeclareUnicodeCharacter{02C6}{\textasciicircum} is probably wrong it should be most likely \DeclareUnicodeCharacter{005E}{\textasciicircum} and several others have similar defects. would be good if that got checked. But if you do that remember the direction of check. one has to start from what the font encoding provides, eg \textasciicircum in that case. then one has to find the UC code for that and that is what should show up in the font encoding dfu. > > >What is the LICR of "fi"? > > > U+FB01 LATIN SMALL LIGATURE FI > > >The ligature mechanism depends on the used fonts, "fi" is not > > >always available. What is better? > > > \DeclareUnicodeCharacter{FB01}{\textfi} > > > \ProvideTextCommandDefault{\textfi}{fi} > > >vs. > > > \DeclareUnicodeCharacter{FB01}{fi} > > > > Definitely the latter. As I understand it, these ligatures are in > > unicode mostly for compatibility with legacy encodings (and perhaps for > > font designers who need to assign something to these glyphs). At least > > as far as TeX is concerned, "fi" doesn't carry any semantic information > > different from "f" "i". > > Example: Assuming there is a word "deaffish" and the > author does not want a ligature ffi spanning both word parts. > Therefore, having a good editor, he uses the Unicode sequence > U+0066 U+FB01 to specify the correct and desired ligature. > Using the later case of \DeclareUnicodeCharacter{FB01} > TeX would get "ffi" and then form the wrong ligature. wrong example in my opinion. as Lars said: fi or ffi ligature ended up in unicode as legacy codes because they were in legacy 8-bit encodings. million other ligatures are not available as "chars" because UC like most other standards are heavily influenced by what is right for certain countries but not others. using "fi" in this way is like using tables in html to position elements on the page, ie it works for that example but ... so the right thing is not to use fi at all here but would be to a generic method to denote subword boundaries or whatever to allow the formatter not to use the ligature. TeX's method would be \textcompwordmark ... but unicode never thought that such encoding of lgoical information is the task of the standard. some of Chris' and my musing on this subject can be found in the paper we gave at the unicode conf in 1996: http://www.latex-project.org/papers/unicode5.pdf frank