At 22.16 +0200 2001-05-10, Javier Bezos wrote: >Lars said: > >> As I understand the Omega draft documentation, there can be no more than >> one OTP (the \InputTranslation) acting on the input of LaTeX at any time >> and that OTP in only meant to handle the basic conversion from the external >> encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit >> Unicode. All this happens way before the input gets tokenized, so there is >> by the way no point in worrying about what the OTP should do with control >> sequences. >> >> The next time any OTP gets to act on the characters is when they are being >> put onto a horizontal list---this is where the OTPs can be stacked and one >> OTP can act on the output on another---i.e., in the first stage of _output_ >> from LaTeX. Yet these are what is described as "Input: set of input >> conventions" (maybe because the Omega draft documentation calls them "Input >> filters") in the itemize-list on page 8!! (Note: I am not questioning >> whether this is a correct summary of the debate---if I am questioning >> anything it is rather the idea expressed in the original contribution.) >> Certainly there is a need for some OTPs to act on the text at this stage, >> but some of the processing should rather be done on the input side of LaTeX >> (for which the current Omega seems to provide very little). I note that the > >I don't see the point of doing that. E.g. normalization of Unicode is something which should happen on the input side, since LaTeX has occationally a need to determine if two pieces of text are equal (cf. the xinitials package). >Processing information after full >expansion is essentially LaTeX without inputenc and fontenc, and very little >code will be broken. Processing the source when it's read could break lot >of things. This means that auxiliary files will have different coding >conventions and therefore differente processes should be applied depending >on the file to be read. I think that is an unnecessary complication. It seems to me that what you are trying to do is to use a modified LaTeX kernel which still does 8-bit input and output (in particular: it encodes every character it puts onto an hlist as an 8-bit quantity) on top of the Omega 16-bit (or whatever it is right now) typesetting engine. Whereas this is more powerful than the current LaTeX in that it can e.g. do language-specific ligature processing without resorting to language-specific fonts, it is no better at handling the problems related to _multilinguality_ because it still cannot handle character sets that spans more than one (8-bit) encoding. How would for example the proposed code deal with the (nonsensical but legal) input a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu? >One of the problems here is if the code to be moved around should be >processed first and then moved (like floats) or moved first and then >processed (like marks). imo, the answer is definitely the second -- >have you tried placing a caption of a figure in the outer margin >of the page? (impossible without modifying the output routine because >figures and captions are first boxed and then moved). As I said, >preserving the original code when moving it around it's essential >to avoid a mess, and in fact that is the very reason things are >\protect'ed. This way, decisions could be taken depending on the >final placement of the material (for example, should be a Japanese >caption typeset vertically or horizontally?). There are many different kinds of processing. Those that have to do with interpreting the input have to be carried out before the material is moved as moving material may change its interpretation. With text being processed as in your example it is far from certain that the caption even can be recognized as Japanese when it is about to be typeset, as everything anyway seems to be reencoded in some 8-bit input encoding before it is typeset! Lars Hellström