LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Classic View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Lars Hellström <[log in to unmask]>
Thu, 10 May 2001 14:50:29 +0200
text/plain (58 lines)
At 01.32 +0200 2001-05-09, Marcel Oliver wrote:
>Apostolos Syropoulos has expressed interest (some time ago) to publish
>a version of this document in Eutupon (the Greek TeX Friends
>newsletter).  Therefore, I would like to make sure that the document
>is as accurate as possible, that everybody is happy with the way I
>presented his contributions, and that the external references are
>useful and complete.  So if I don't hear complaints, I assume that
>everything is cool.

I find the name of Section 2 (LaTeX Internal Character Representation)
rather strange, as there is very little in that section that concerns the
LICR. The main topic of that section seems rather to be the shortcomings of
TeX (as a typesetting engine).

The comparison in Section 3.2.1 of how characters are processed in TeX and
Omega respectively also seems strange. In Omega case (b), column C, we see
that the LICR character \'e is converted to an 8-bit character "82 before
some OTP converts it to the Unicode character "00E9 in column D. Surely
this can't be right---whenever LICR is converted to anything it should be
to full Unicode, since we will otherwise end up in an encoding morass much
worse than that in current LaTeX.

It also seems to me that there is some confusion---in the debate as well as
in the summary---of where the boundary between "input" and "output" is
located. Since LaTeX is a TeX format it lives between the "eye" and the
"stomach", and thus to LaTeX everything which happens to text from
evaluation (character tokens enter the stomach to be typeset) and on is
part of the output process. Much of what has been written about Omega seem
instead to draw the line between input and output at a much later position.
Hence some of the things which have been described as Omega extensions that
act on the input are from LaTeX's view rather yet another thing that act on
the output.

As I understand the Omega draft documentation, there can be no more than
one OTP (the \InputTranslation) acting on the input of LaTeX at any time
and that OTP in only meant to handle the basic conversion from the external
encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit
Unicode. All this happens way before the input gets tokenized, so there is
by the way no point in worrying about what the OTP should do with control

The next time any OTP gets to act on the characters is when they are being
put onto a horizontal list---this is where the OTPs can be stacked and one
OTP can act on the output on another---i.e., in the first stage of _output_
from LaTeX. Yet these are what is described as "Input: set of input
conventions" (maybe because the Omega draft documentation calls them "Input
filters") in the itemize-list on page 8!! (Note: I am not questioning
whether this is a correct summary of the debate---if I am questioning
anything it is rather the idea expressed in the original contribution.)
Certainly there is a need for some OTPs to act on the text at this stage,
but some of the processing should rather be done on the input side of LaTeX
(for which the current Omega seems to provide very little). I note that the
last paragraph of Section 3 mentions the problem that Omega does not
provide any OTP processing of text when it is between the eye and the

Lars Hellström