## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

Lars Hellström writes:

> >but first i would like to hear comments/suggestions on the approach
>
> I like it -- it's a nice compromise between practical usefulness and the
> ideal that input and output encodings should be independent.

you can explain that one day to me :-)

> >\gdef\DeclareUnicodeCharacter#1#2{%
> >   \count@"#1\relax
> >   \typeout{ \space\space defining Unicode char #1 (decimal \the\count@)}%
>
> That should probably be better as
>
>   \typeout{ \space\space defining Unicode char U+#1 (decimal \the\count@)}%

yes.

> >%</t1>
> >%    \end{macrocode}
> >%    The following definitions are in the encoding file but have no
> >%    direct equivalent in Unicode or simply do not make sense in that
> >%    context (or I couldn't find anything or \ldots :-).
> >%\begin{verbatim}
> >%\DeclareTextSymbol{\j}{OT1}{17}
> >%\DeclareTextSymbol{\SS}{T1}{223}
> >%\DeclareTextSymbol{\textcompwordmark}{T1}{23}
>
> I would say that the compwordmark is U+200C (ZERO WIDTH NON-JOINER); that
> character is for example supposed to prevent ligaturing between characters.

yes

> Adobe has assigned dotlessj to U+F6BE (LATIN SMALL LETTER DOTLESS J), but
> that is inofficial (and thus not universal) as it resides in the private
> use area.

what we try is to provide a utf8 input encoding, how likely is it that some
editor or application generates that Adobe thing? not very i would guess (at
least not now) therefore i would not assign anything.

> U+00DF (LATIN SMALL LETTER SHARP S) is uppercased as SS (two
> U+0053), so there probably isn't any \SS.

that was my guess too.

> >% But the following (and some others) might actually lurk in Unicode
> >%    somewhere\ldots
> >%\begin{verbatim}
> >%\DeclareTextSymbol{\textasteriskcentered}{OMS}{3}   % "03
>
> How about U+2217 (ASTERISK OPERATOR)?

that would be wrong in my opinion. the internal LaTeX form
\textasteriskcentered is clearly a text character and  U+2217 is a math
symbol. so if some application is requesting  U+2217 it should get a * in math
mode that is (probably, haven't checked the unicode page) a relation or a
binary operator.

> OTOH, \textasteriskcentered is probably a glyphic variation on the normal
> asterisk rather than a separate character.

more like that, which means we should not map it unless there is a dedicated
character for this in unicode

> >%\DeclareTextCommand{\textcircled}{OMS}
>
> U+24B6 (CIRCLED LATIN CAPITAL LETTER A) is decribed as being approximately
> "<circle> A", where "<circle>" means "something expressed by a higher level
> protocol" (such as LaTeX). Hence I don't think there is a character for
> this.

well there is U+20DD but this is a combining char and although that is more or
less what we want we can't support combining in the way unicode uses it, so
yes, this is not an individual char. However, one could implement the unicode
chars U+24B6-U+24E9 if we wish to do that as they would be

\DeclareUnicodeCharacter{24B6}{\textcircled{A}}
...
\DeclareUnicodeCharacter{24E9}{\textcircled{z}}

whether that is worth doing, I don't know. I guess as part of the exercise we
should perhaps build an extended list of all mapping from unicode to known
(abd used) encoding-specific commands.

then people can pick additional mappings as neccessary, or load the whole
thing if they have a big enough TeX. A good starting point would be
Sebastian's ucharacters.sty from which my lists where more or less stolen

frank