Subject: | |
From: | |
Reply To: | |
Date: | Sat, 26 May 2001 20:26:32 +0200 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
I'd like to bring the discussion back to the ICR issue, in particular
how a hypothetical successor to TeX should handle input encodings. I
think the point that Omega does not do it the "right" way has been
made pretty clearly.
But what should be the "right" way? I repost some thoughts from last
week that seem to have been lost among the \(var)epsilons.
--Marcel
Frank Mittelbach writes:
> > In fact, \InputEncoding was not intended for that, but only for
> > "technical" translations which applies to the whole document
> > as one byte -> two byte or little endian -> big endian. The main
> > problem of it is that it doesn't translate macros:
> > \def\myE{É}
> > \InputEncoding <an encoding>
> > É\myE
>
> \InputEncoding is the point where one need to go from external
> source encoding to OICR that is precisely the wound: the current
> \InputEncoding isn't doing this job fully (and that it is not clear
> how to do it properly (to be fair))
How about this:
- There is one default \InputTranslation (this, rather than
\InputEncoding, is the official name of the Omega command) which may
need to be specified at the time of format creation. This encoding
is the one that all macro names need to be in, as well as the
encoding initially selected for text (I think it does not make any
sense to allow for multiply encoded macro names in a single
document). As there is no legacy cruft with regard to macro names,
we may as well force this default encoding to be UTF-8.
- Changes in the \InputTranslation follow the usual TeX scoping rules
(this is obviously not how Omega currently does it), and take effect
immediately during the initial tokenization. This would mean that
the characters \ { } must be in their expected position in every
permissible encoding, but I guess that's not any more restrictive
than what we currently have. I also assume that TeX (Omega) always
knows whether it is parsing code or text, so that it can select the
default for code, and the top of the encoding stack for text.
- Regarding Javier's above example: I think this is the correct and
expected behavior. I want to be able to able to write:
\begin{chinese}
\newcommand{\foo}{***something chinese***}
\newcommand{\bar}{***and some more chinese***}
\end{chinese}
The chinese characters \foo\ and \bar\ are not easy to enter on a
western keyboard. If you need to frequently use \foo\ in your
scholarly discussion of Chinese literature, it is better to first
define macros for all the chinese characters you need, and then just
write \verb|\foo| whenever you need \foo.
(I don't know if this babel-like begin-end of a language selection
would actually be legal in the document preamble, but I think the
strategy is very natural at least.)
- It may be more of a problem how to deal with \'e and the like.
Would it be possible to force immediate expansion into the
corresponding internal Unicode token?
|
|
|