LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

Am 14.01.19 um 07:15 schrieb Kelly Smith:

>> of course this is way before the event of unicode or xetex/luatex ...
> 
> As for XeTeX and LuaTeX, it appears that UTF-8 text doesn’t get converted
> to an internal representation, according to some quick experiments I tried.

in some sense I would love to make that happen (as there are also 
advantages to it) but it would come at a high cost

  - loss of speed (somewhat)
  - impossibility to use utf8 chars in csnames, eg in luatex you can do

    \newcommand\Füße{}

     as long as the chars are of \catcode 11, once they are active they 
can't be used in this way and while I personally wouldn't mind, I can 
understand that depending on your language you rather want that to be 
possible.

The second point really means that it is a no-go

> I guess that it would be impractical to set up automatic conversions
> for such a large character set.

There is that.

> And, as you said, since the UTF-8 will
> survive reading and writing to and from files, an internal representation
> isn’t critical.

The downside is that you simply don't know if the character will typeset 
correctly in a given situation, as that approach assumes that everything 
is unicode and that there is something at the other end when you ask the 
font to render the utf8 char--- as that is not necessarily true you may 
end up with tofu in other words a certain step backwards from the 
situation we had in that respect in 8bit

> When running on XeTeX or LuaTeX, is there an interface for getting the
> Unicode character(s) represented by a text command? (e.g. \"{a} -> ä,
> useful for string comparison). I suppose one could easily achieve that by
> locally redefining all the text commands?

Not necessary, you just need to process the definitions in the right 
state, eg

\typeout{ä \"w}

\makeatletter
\protected@edef\foo{ä \"w}

\show\foo

will give you this as output:

ä \"w
 > \foo=macro:
->ä \"w.
l.14 \show\foo

cheers
frank