> >Well, the \InputTranslation and \OutputTranslation primitives of Omega
> >already provide that functionality, so there is no need to deal with
> >variable-sized characters in the TeX programming. The problem is that one
> >might want to employ additional sets of translations (which would then act
> >on streams of equally-sized characters) between those extremes of the
> >program, but Omega doesn't provide for this.
>
> I am not sure what you mean here: UTF-8 is variable sized.
gasp
> I suggested that for every file not using a 32-bit character type, one has
> an additional file (in ASCII) identified by some kind of file name ending
> with information about the encoding. (For example, if the file "<name>" is
> not 32-bit, is there si also an ASCII file named "<name>.encoding".)
yeah yeah yeah; all good osi-style practice ... but no-one really uses
much of osi networking nowadays, and for good reason -- the techniques
it employs are too clunky[*] for the real world.
in practice, most people know what encodings their files are in. and
if they're into unicode, and encoding in utf-8 or utf-16, the chance
that they'll also be using another encoding is likely rather small; if
they're using latin-1 in parallel, it'll be consumed quite happily by
a utf-8 decoder. imposing a schema file on *everything* is wild
overkill.
> This way, one can provide as many IO code converters as one bothers to
> write, without the extended TeX ever knows anything about it. (If Omega
> uses C++ for IO, one can use something called a codecvt. Or use pipes,
> where available.)
no. omega does (shame) use clunky old c++ for some parts of its
operation, but it uses its own ocp mechanism for transforming
encodings. macro coding to switch ocps at input time is trivial, but
not attractive for the normal case of using the same encoding all the
time.
[*] except in the areas "original" ip doesn't natively cope with at
all, like fully-extensible addressing and security.
|