LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Classic View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Robin Fairbairns <[log in to unmask]>
Mon, 14 May 2001 10:10:07 +0100
text/plain (41 lines)
> >Well, the \InputTranslation and \OutputTranslation primitives of Omega
> >already provide that functionality, so there is no need to deal with
> >variable-sized characters in the TeX programming. The problem is that one
> >might want to employ additional sets of translations (which would then act
> >on streams of equally-sized characters) between those extremes of the
> >program, but Omega doesn't provide for this.
> I am not sure what you mean here: UTF-8 is variable sized.


> I suggested that for every file not using a 32-bit character type, one has
> an additional file (in ASCII) identified by some kind of file name ending
> with information about the encoding. (For example, if the file "<name>" is
> not 32-bit, is there si also an ASCII file named "<name>.encoding".)

yeah yeah yeah; all good osi-style practice ... but no-one really uses
much of osi networking nowadays, and for good reason -- the techniques
it employs are too clunky[*] for the real world.

in practice, most people know what encodings their files are in.  and
if they're into unicode, and encoding in utf-8 or utf-16, the chance
that they'll also be using another encoding is likely rather small; if
they're using latin-1 in parallel, it'll be consumed quite happily by
a utf-8 decoder.  imposing a schema file on *everything* is wild

> This way, one can provide as many IO code converters as one bothers to
> write, without the extended TeX ever knows anything about it. (If Omega
> uses C++ for IO, one can use something called a codecvt. Or use pipes,
> where available.)

no.  omega does (shame) use clunky old c++ for some parts of its
operation, but it uses its own ocp mechanism for transforming
encodings.  macro coding to switch ocps at input time is trivial, but
not attractive for the normal case of using the same encoding all the

[*] except in the areas "original" ip doesn't natively cope with at
all, like fully-extensible addressing and security.