LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Classic View

Use Proportional Font
Show Text Part by Default
Condense Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender: Mailing list for the LaTeX3 project <[log in to unmask]>
Date: Tue, 12 Feb 2019 17:13:43 +0100
Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>
Message-ID: <[log in to unmask]>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
In-Reply-To: <[log in to unmask]>
Content-Type: text/plain; charset=utf-8; format=flowed
From: Frank Mittelbach <[log in to unmask]>
Parts/Attachments: text/plain (63 lines)
Am 14.01.19 um 07:15 schrieb Kelly Smith:

>> of course this is way before the event of unicode or xetex/luatex ...
> As for XeTeX and LuaTeX, it appears that UTF-8 text doesn’t get converted
> to an internal representation, according to some quick experiments I tried.

in some sense I would love to make that happen (as there are also 
advantages to it) but it would come at a high cost

  - loss of speed (somewhat)
  - impossibility to use utf8 chars in csnames, eg in luatex you can do


     as long as the chars are of \catcode 11, once they are active they 
can't be used in this way and while I personally wouldn't mind, I can 
understand that depending on your language you rather want that to be 

The second point really means that it is a no-go

> I guess that it would be impractical to set up automatic conversions
> for such a large character set.

There is that.

> And, as you said, since the UTF-8 will
> survive reading and writing to and from files, an internal representation
> isn’t critical.

The downside is that you simply don't know if the character will typeset 
correctly in a given situation, as that approach assumes that everything 
is unicode and that there is something at the other end when you ask the 
font to render the utf8 char--- as that is not necessarily true you may 
end up with tofu in other words a certain step backwards from the 
situation we had in that respect in 8bit

> When running on XeTeX or LuaTeX, is there an interface for getting the
> Unicode character(s) represented by a text command? (e.g. \"{a} -> ä,
> useful for string comparison). I suppose one could easily achieve that by
> locally redefining all the text commands?

Not necessary, you just need to process the definitions in the right 
state, eg

\typeout{ä \"w}

\protected@edef\foo{ä \"w}


will give you this as output:

ä \"w
 > \foo=macro:
->ä \"w.
l.14 \show\foo