At 21.34 +0100 2003-01-07, Frank Mittelbach wrote: >following up on the discussion concering utf-8 support for LaTeX, below is a >package written to provide that support within the inputenc framework. > >it is not complete, nor are its tables set up finally, we would need some >volunteers to help us here. > >but first i would like to hear comments/suggestions on the approach I like it -- it's a nice compromise between practical usefulness and the ideal that input and output encodings should be independent. >% \begin{macrocode} >\gdef\DeclareUnicodeCharacter#1#2{% > \count@"#1\relax > \typeout{ \space\space defining Unicode char #1 (decimal \the\count@)}% That should probably be better as \typeout{ \space\space defining Unicode char U+#1 (decimal \the\count@)}% >%</t1> >% \end{macrocode} >% The following definitions are in the encoding file but have no >% direct equivalent in Unicode or simply do not make sense in that >% context (or I couldn't find anything or \ldots :-). >%\begin{verbatim} >%\DeclareTextSymbol{\j}{OT1}{17} >%\DeclareTextSymbol{\SS}{T1}{223} >%\DeclareTextSymbol{\textcompwordmark}{T1}{23} I would say that the compwordmark is U+200C (ZERO WIDTH NON-JOINER); that character is for example supposed to prevent ligaturing between characters. Adobe has assigned dotlessj to U+F6BE (LATIN SMALL LETTER DOTLESS J), but that is inofficial (and thus not universal) as it resides in the private use area. U+00DF (LATIN SMALL LETTER SHARP S) is uppercased as SS (two U+0053), so there probably isn't any \SS. >% But the following (and some others) might actually lurk in Unicode >% somewhere\ldots >%\begin{verbatim} >%\DeclareTextSymbol{\textasteriskcentered}{OMS}{3} % "03 How about U+2217 (ASTERISK OPERATOR)? OTOH, \textasteriskcentered is probably a glyphic variation on the normal asterisk rather than a separate character. >%\DeclareTextCommand{\textcircled}{OMS} U+24B6 (CIRCLED LATIN CAPITAL LETTER A) is decribed as being approximately "<circle> A", where "<circle>" means "something expressed by a higher level protocol" (such as LaTeX). Hence I don't think there is a character for this. Lars Hellström