LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Classic View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Topic:	[<< First] [< Prev] [Next >] [Last >>]

Re: Multilingual Encodings Summary 2.2

Lars Hellström <[log in to unmask]>

Fri, 11 May 2001 16:41:01 +0200

text/plain (69 lines)

At 22.16 +0200 2001-05-10, Javier Bezos wrote:
>Lars said:
>
>> As I understand the Omega draft documentation, there can be no more than
>> one OTP (the \InputTranslation) acting on the input of LaTeX at any time
>> and that OTP in only meant to handle the basic conversion from the external
>> encoding (ASCII, latin-1, UTF-8, or whatever) to the internal 32-bit
>> Unicode. All this happens way before the input gets tokenized, so there is
>> by the way no point in worrying about what the OTP should do with control
>> sequences.
>>
>> The next time any OTP gets to act on the characters is when they are being
>> put onto a horizontal list---this is where the OTPs can be stacked and one
>> OTP can act on the output on another---i.e., in the first stage of _output_
>> from LaTeX. Yet these are what is described as "Input: set of input
>> conventions" (maybe because the Omega draft documentation calls them "Input
>> filters") in the itemize-list on page 8!! (Note: I am not questioning
>> whether this is a correct summary of the debate---if I am questioning
>> anything it is rather the idea expressed in the original contribution.)
>> Certainly there is a need for some OTPs to act on the text at this stage,
>> but some of the processing should rather be done on the input side of LaTeX
>> (for which the current Omega seems to provide very little). I note that the
>
>I don't see the point of doing that.

E.g. normalization of Unicode is something which should happen on the input
side, since LaTeX has occationally a need to determine if two pieces of
text are equal (cf. the xinitials package).

>Processing information after full
>expansion is essentially LaTeX without inputenc and fontenc, and very little
>code will be broken. Processing the source when it's read could break lot
>of things. This means that auxiliary files will have different coding
>conventions and therefore differente processes should be applied depending
>on the file to be read. I think that is an unnecessary complication.

It seems to me that what you are trying to do is to use a modified LaTeX
kernel which still does 8-bit input and output (in particular: it encodes
every character it puts onto an hlist as an 8-bit quantity) on top of the
Omega 16-bit (or whatever it is right now) typesetting engine. Whereas this
is more powerful than the current LaTeX in that it can e.g. do
language-specific ligature processing without resorting to
language-specific fonts, it is no better at handling the problems related
to _multilinguality_ because it still cannot handle character sets that
spans more than one (8-bit) encoding. How would for example the proposed
code deal with the (nonsensical but legal) input
   a\'{e}\k{e}\cyrya\cyrdje\cyrsacrs\cyrphk\textmu?

>One of the problems here is if the code to be moved around should be
>processed first and then moved (like floats) or moved first and then
>processed (like marks). imo, the answer is definitely the second --
>have you tried placing a caption of a figure in the outer margin
>of the page? (impossible without modifying the output routine because
>figures and captions are first boxed and then moved). As I said,
>preserving the original code when moving it around it's essential
>to avoid a mess, and in fact that is the very reason things are
>\protect'ed. This way, decisions could be taken depending on the
>final placement of the material (for example, should be a Japanese
>caption typeset vertically or horizontally?).

There are many different kinds of processing. Those that have to do with
interpreting the input have to be carried out before the material is moved
as moving material may change its interpretation. With text being processed
as in your example it is far from certain that the caption even can be
recognized as Japanese when it is about to be typeset, as everything anyway
seems to be reencoded in some 8-bit input encoding before it is typeset!

Lars Hellstr�m

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung