## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: Re: Text Commands in XeLaTeX and LuaLaTeX From: Frank Mittelbach <[log in to unmask]> Reply To: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Tue, 12 Feb 2019 17:13:43 +0100 Content-Type: text/plain Parts/Attachments: text/plain (63 lines)
Am 14.01.19 um 07:15 schrieb Kelly Smith:

>> of course this is way before the event of unicode or xetex/luatex ...
>
> As for XeTeX and LuaTeX, it appears that UTF-8 text doesn’t get converted
> to an internal representation, according to some quick experiments I tried.

in some sense I would love to make that happen (as there are also
advantages to it) but it would come at a high cost

- loss of speed (somewhat)
- impossibility to use utf8 chars in csnames, eg in luatex you can do

\newcommand\Füße{}

as long as the chars are of \catcode 11, once they are active they
can't be used in this way and while I personally wouldn't mind, I can
understand that depending on your language you rather want that to be
possible.

The second point really means that it is a no-go

> I guess that it would be impractical to set up automatic conversions
> for such a large character set.

There is that.

> And, as you said, since the UTF-8 will
> survive reading and writing to and from files, an internal representation
> isn’t critical.

The downside is that you simply don't know if the character will typeset
correctly in a given situation, as that approach assumes that everything
is unicode and that there is something at the other end when you ask the
font to render the utf8 char--- as that is not necessarily true you may
end up with tofu in other words a certain step backwards from the
situation we had in that respect in 8bit

> When running on XeTeX or LuaTeX, is there an interface for getting the
> Unicode character(s) represented by a text command? (e.g. \"{a} -> ä,
> useful for string comparison). I suppose one could easily achieve that by
> locally redefining all the text commands?

Not necessary, you just need to process the definitions in the right
state, eg

\typeout{ä \"w}

\makeatletter
\protected@edef\foo{ä \"w}

\show\foo

will give you this as output:

ä \"w
> \foo=macro:
->ä \"w.
l.14 \show\foo

cheers
frank