At 13:12 +0200 2001/05/27, Marcel Oliver wrote:
>...it looks like there are a couple of strategies:
>1. Store the full language context with every character token sequence
> along the lines that Javier suggests.
I think that this might turn out to be a no-no for the simple sake of
speed: Characters are at such a fundamental level that they should be
computationally as simple as possible.
I got the impression that the current Omega makes use of only 16-bit
characters (right?). -- It is however possible with C/C++ to guarantee an
integral type with at least 32 bits in it, if one stays away from wchar_t.
>2. Treat input encoding completely separate from language context.
> Input encoding just determines how to get from an arbitrary
> encoding to the Unicode(-like) ICR. Thus, switches in the language
> context have to be tagged explicitly by the user.
>3. Extreme version of 2 (the only strategy that seems to be cleanly
> implementable on current Omega):
> We simply define the \InputTranslation to be fixed on a per-file
I think of a hybrid between these two:
One advantage of the last one, 3, is that formats become independent of IO
encodings: If there is a mechanism external to the file selecting the
encoding, it will be possible to choose the encoding of .aux files etc.,
and then get Omega get to read it back without changing any pre-compiled
format. If the only mechanism is selecting encoding from within a file that
is compiled, this will not be possible.
> In other words, we acknowledge that it does not make any
> sense in terms of usability to mix input encodings, as such files
> simply cannot (and should not) be displayed cleanly in any editor.
This does not follow: One can easily define an translation that can handle
different input encodings in the same file.
The requirement is instead that the translator must know when it reads the
file byte by byte when and how to switch. If you integrate these switches
with TeX's macro system, then switches can be hard to predict, but that is
On the other hand, Robin Fairbairns didn't like the approach 3, because the
directory might become littered with files indicating the encoding.
So why not do this: When Omega starts, one indicates the encoding in the
first file that Omega is reading. This would be a mode (cf Omega draft, ch
12), plus an OTP (loc.cit. ch. 8). There can be some simplifying defaults
corresponding to formats that editors can handle (like ASCII and Unicode).
Then other files can be opened using information about mode + OTP as I
figure is the case now.
But in addition, one can provide external encoding information about a file
that overrides the translation information in the command opening the file.
This way, even though a format is compiled to write and read .aux files in
say Unicode, one may override it and get Omega to write and read .aux files
in say UTF8.
The question though, when playing around with these ideas, is how people
will use the features implemented.