LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Tue, 12 Feb 2019 17:13:43 +0100
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Message-ID:
Subject:
MIME-Version:
1.0
Content-Transfer-Encoding:
8bit
In-Reply-To:
Content-Type:
text/plain; charset=utf-8; format=flowed
From:
Frank Mittelbach <[log in to unmask]>
Parts/Attachments:
text/plain (63 lines)
Am 14.01.19 um 07:15 schrieb Kelly Smith:

>> of course this is way before the event of unicode or xetex/luatex ...
> 
> As for XeTeX and LuaTeX, it appears that UTF-8 text doesn’t get converted
> to an internal representation, according to some quick experiments I tried.

in some sense I would love to make that happen (as there are also 
advantages to it) but it would come at a high cost

  - loss of speed (somewhat)
  - impossibility to use utf8 chars in csnames, eg in luatex you can do

    \newcommand\Füße{}

     as long as the chars are of \catcode 11, once they are active they 
can't be used in this way and while I personally wouldn't mind, I can 
understand that depending on your language you rather want that to be 
possible.

The second point really means that it is a no-go

> I guess that it would be impractical to set up automatic conversions
> for such a large character set.

There is that.

> And, as you said, since the UTF-8 will
> survive reading and writing to and from files, an internal representation
> isn’t critical.

The downside is that you simply don't know if the character will typeset 
correctly in a given situation, as that approach assumes that everything 
is unicode and that there is something at the other end when you ask the 
font to render the utf8 char--- as that is not necessarily true you may 
end up with tofu in other words a certain step backwards from the 
situation we had in that respect in 8bit

> When running on XeTeX or LuaTeX, is there an interface for getting the
> Unicode character(s) represented by a text command? (e.g. \"{a} -> ä,
> useful for string comparison). I suppose one could easily achieve that by
> locally redefining all the text commands?

Not necessary, you just need to process the definitions in the right 
state, eg

\typeout{ä \"w}

\makeatletter
\protected@edef\foo{ä \"w}

\show\foo

will give you this as output:

ä \"w
 > \foo=macro:
->ä \"w.
l.14 \show\foo

cheers
frank

ATOM RSS1 RSS2