At Tue, 5 Jun 2001 13:29:19 +0100, Chris Rowley wrote: [...] >Therefore, rather than attempting to categorise the necessary >information and devise suitable ways to provide it, Frank and I came >up with the idea of simply supplying a single logical label for every >ICR string. Since the first, and still the overwhelmingly most >diverse, >parts of this information came from the needs of multi-lingual >documents, we called this label the `language' (maybe not a good >choice). Our thesis is that `every text string must have a >language-label'. The only property these labels need (and indeed are >able) to have is that they \emph{can} help any application or >sub-process to access the information it needs to process that text >string. I suggest that we use the term `context' rather than `language' here. Quoting Webster's, `context' means: The part of a written discourse in which a certain word, phrase or passage appears, necessary to point the meaning, as, it is hard to tell the exact meaning of a word out of context. [snip] >[In order to distinguish these logical language-labels from anything >else in the TeX world let us call them LLLs.] > >In the context of current TeX-related systems this >means that: > >-- whenever a character token list (in an ICR) is constructed or > moved, then its LLL must go with it; The most common event at which a character token list is formed is when a command is grabbing one of its arguments. With the xparse package in full control these arguments can be labelled under the current TeX engine, but it is probably more reasonable to imagine that their attachment is handled by primitive mechanisms in some extension of TeX. In this case, I suspect the labels should be thought of as being nestable with separate markers for beginning and end, so that each token list that is formed gets delimited by matching begin and end labels that record the current context of the token list they were extracted from. Thus if we have, in an English context \subsubsection{The use of <begin-swedish>\"alv<end-swedish>} (where the <..> denote such context labels), the token list becoming the argument of \subsubsection would be <begin-english>The use of <begin-swedish>\"alv<end-swedish><end-english> And then it doesn't matter if it is inserted into a French context table of contents. Upon being written to an external file, the labels should be converted to suitable markup. An interesting question is whether these labels should be explicit tokens or be hidden from the user (i.e., argument grabbing and things like \futurelet look past them). Making them explicit tokens would probably break tons of code. As for what the labels should be to the user, I think a scheme of making them integers is pretty useless (how they are implemented is of course another matter). A better idea would be to make them some kind of property lists, i.e., containers for diverse forms of information that are indexed by some kind of names. Creating new label values from old by copying the values and then changing some would be useful when defining dialects. The main problem I see with context labels is that of when they should be attached, since one cannot do any context-dependent processing before the context is determined. I can think of at least three different models: 1. Labels must be present in the input (e.g. encoded using control characters). This might be nice from an implementation point of view, but it is probably only realistic if such a system would emerge which is accepted in a much wider community than that of the users of TeX, due to the problem of finding suitable editors. This doesn't seem likely. 2. Do as today, i.e., context switches are initiated when commands are executed. This has the problem that the context isn't completely known until the text is being typeset, so one cannot do any irreverible context-dependent processing until then. This seems a bit too restrictive to me. 3. Have command-like markup for context-switching, but attach labels as part of the tokenization. This has the merit of looking like current LaTeX markup and allowing LaTeX to keep all ICR strings fully context-labeled, but it would also mean that processing of markup is a two-step process (first all language markup is processed, then all the rest). That doesn't feel right. Lars Hellström