LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mime-Version:
1.0
Content-Type:
text/plain; charset="iso-8859-1"
Date:
Thu, 9 Jan 2003 21:22:57 +0100
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:
From:
Lars Hellström <[log in to unmask]>
In-Reply-To:
Content-Transfer-Encoding:
8bit
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:
text/plain (58 lines)
At 21.34 +0100 2003-01-07, Frank Mittelbach wrote:
>following up on the discussion concering utf-8 support for LaTeX, below is a
>package written to provide that support within the inputenc framework.
>
>it is not complete, nor are its tables set up finally, we would need some
>volunteers to help us here.
>
>but first i would like to hear comments/suggestions on the approach

I like it -- it's a nice compromise between practical usefulness and the
ideal that input and output encodings should be independent.

>%    \begin{macrocode}
>\gdef\DeclareUnicodeCharacter#1#2{%
>   \count@"#1\relax
>   \typeout{ \space\space defining Unicode char #1 (decimal \the\count@)}%

That should probably be better as

  \typeout{ \space\space defining Unicode char U+#1 (decimal \the\count@)}%



>%</t1>
>%    \end{macrocode}
>%    The following definitions are in the encoding file but have no
>%    direct equivalent in Unicode or simply do not make sense in that
>%    context (or I couldn't find anything or \ldots :-).
>%\begin{verbatim}
>%\DeclareTextSymbol{\j}{OT1}{17}
>%\DeclareTextSymbol{\SS}{T1}{223}
>%\DeclareTextSymbol{\textcompwordmark}{T1}{23}

I would say that the compwordmark is U+200C (ZERO WIDTH NON-JOINER); that
character is for example supposed to prevent ligaturing between characters.
Adobe has assigned dotlessj to U+F6BE (LATIN SMALL LETTER DOTLESS J), but
that is inofficial (and thus not universal) as it resides in the private
use area. U+00DF (LATIN SMALL LETTER SHARP S) is uppercased as SS (two
U+0053), so there probably isn't any \SS.

>% But the following (and some others) might actually lurk in Unicode
>%    somewhere\ldots
>%\begin{verbatim}
>%\DeclareTextSymbol{\textasteriskcentered}{OMS}{3}   % "03

How about U+2217 (ASTERISK OPERATOR)?
OTOH, \textasteriskcentered is probably a glyphic variation on the normal
asterisk rather than a separate character.

>%\DeclareTextCommand{\textcircled}{OMS}

U+24B6 (CIRCLED LATIN CAPITAL LETTER A) is decribed as being approximately
"<circle> A", where "<circle>" means "something expressed by a higher level
protocol" (such as LaTeX). Hence I don't think there is a character for
this.

Lars Hellström

ATOM RSS1 RSS2