LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Classic View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mime-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Date: Mon, 14 Jan 2019 07:15:34 +0100
Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>
Content-Transfer-Encoding: quoted-printable
Message-ID: <[log in to unmask]>
Sender: Mailing list for the LaTeX3 project <[log in to unmask]>
From: Kelly Smith <[log in to unmask]>
Parts/Attachments: text/plain (47 lines)
2019-01-10T22:22:53+01:00, Frank Mittelbach:
>> There will need to be a way to
>> handle this gracefully—and more robustly than, say, just asking users
>> to prefer text commands.
> nobody does this or should do this (or did really even in the past). The
> whole idea of inputenc was to enable the user to use sophisticated input
> methods when available, eg type ä or ® or <you name it>
> the whole system works by converting whatever input method to a
> "standard LICR", eg
>   ä  -> \"{a}
>   ®  -> \textregistered
>   ...
> all behind the scene. From that point on it could be safely used within
> LaTeX like sending it to the aux file or typetting it as a single glyph
> when in T1 encoding or as \accent ... a when in OT1 or as ...

I’m sorry, again I mistakenly assumed there was a problem, when in fact,
there was no problem at all!

> anyway, for history but I think still somewhat illuminating  the whole 
> area I recommend my 1995 talk about encodings:

Thank-you, this clarifies the situation.

> of course this is way before the event of unicode or xetex/luatex ...

As for XeTeX and LuaTeX, it appears that UTF-8 text doesn’t get converted
to an internal representation, according to some quick experiments I tried.
I guess that it would be impractical to set up automatic conversions
for such a large character set. And, as you said, since the UTF-8 will
survive reading and writing to and from files, an internal representation
isn’t critical.

When running on XeTeX or LuaTeX, is there an interface for getting the
Unicode character(s) represented by a text command? (e.g. \"{a} -> ä,
useful for string comparison). I suppose one could easily achieve that by
locally redefining all the text commands?