LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Frank Mittelbach <[log in to unmask]>
Sat, 17 Feb 2001 22:54:05 +0100
text/plain (89 lines)
Frank Mittelbach writes:
 >
 > A few people will unfortunately get this posting twice since it is both sent
 > to LATEX-L as well as to the Omega developers (several of which are on
 > LaTeX-L), sorry for that.
 >
 > We thought this advisable as we make a number of suggestions regarding
 > extensions/changes to Omega's character token processing. (Any technical
 > discusion of these suggestions should probably be confined to the omega
 > developers list though)

fortunately it is a weekend and nobody has already told us ... so we can at
least claim we found out ourselves shortly after sending the message out:
Omega already has input modes and translations  which do support what we are
asking for, ie the translation from the source document to the internal
unicode form.

thus OICR1=OICR2 and all our rambling about it was wrong

what seems to remain is

 a) problems with controlling these input translations; the way it works in
 omega according to the documentation is that a change applies to the next
 line in a file. However in an example like the following:

    \ocp\OCPa=inutf8

    \def\foo{abcäd} % default seems to be latin1
    \show\foo


    % the following fails (not surprisingly)
    % and can't be corrected later on

    \def\foo{ab
    \InputTranslation currentfile\OCPa
    cä}
    \show\foo


 the second \foo will now contains the tokens

   \foo=macro:
    ->ab \InputTranslation currentfile\OCPa c^^c3^^a4.

 thus if you ever use this \foo later on you will get the wrong characters
 because the input was umlaut-a in utf8 but what is stored in \foo are the
 _two_ characters uppercase-A-with-tilde and currency-sign).

 furthermore if this \foo is used anywhere it will change the input
 translation from the next line on to utf8 and this could be in a completely
 different file.

 This might look like a contrieved example but on a higher level of macro
 encoding this type of problem will happen whenever an \InputTranslation is
 used either directly or within some macro definition (like a language tag)
 and that is placed, for example, inside an argument of some other tag.

 Since we have been asked to provide input encoding changes for LaTeX within
 paragraphs, eg for individual words, something like this would happen if such
 a change appears, say, inside the argument of \section.


 b)  the other problem that seems to remain is:

 > Another problem of the current model seems to be that, even if trans A did the
 > encoding transformation to Unicode ie we have only a single OICR,
 > transformations of type D (ie transformation of character token strings) can't
 > be controlled by a mechanism similar to the one that is available for
 > transformations of type C, ie in one case we have ocps and in the other area,
 > when we work on structural issues like building TOC or arranging data for page
 > representation no such mechanism is available. Thus is seems interesting to
 > think about whether or not a similar concept (not necessarily the same!)
 > should be made available for this part of the process.
 >
 > In other words the concept of ocps makes perfect sense for character string
 > manipulation but one has to [pretend] to typeset something to have them
 > available in current Omega, but a large amount of document processing is
 > concerned with character string manipulation not related to typesetting at
 > all.

what is no longer a problem though is the example we gave for the above since
for that particular case (writing to output streams) Omega provides output
translations.

hope by this correction we got a little closer to the truth :-)

frank & chris

ATOM RSS1 RSS2