Javier Bezos writes: > If I say > > \begin{mandarin} > \newcommand{\foo}{<Unicode char corresponding to Chinese ai>} > \end{mandarin} > > how TeX knows that \foo\ was defined in a Mandarin context (including > perhaps input encoding information)? And what is expected by the user, > that the Chinese char should be considered "conceptual" (thus rendered > differently in Japanese and Mandarin) or that the Chinese char must be > rendered with the simplified ideogram (ie, Mandarin vs. Japanese)? > What makes that different from, say, > \newcommand{\foo}{\unichar{<Unicode code>}} > (without specifying the language)? Oh, looks like I fell into the eurocentric mind-trap that "character=glyph"... So it looks like there are a couple of strategies: 1. Store the full language context with every character token sequence along the lines that Javier suggests. In other words, treat the language context as part of the input encoding. It would seem that if Frank's requirement for an ICR ("a single item must have a unique and well-defined meaning") is to be met, it would essentially mean that every character needs to be tagged for language context. 2. Treat input encoding completely separate from language context. Input encoding just determines how to get from an arbitrary encoding to the Unicode(-like) ICR. Thus, switches in the language context have to be tagged explicitly by the user. So the example would become \begin{utf8-encoding} \newcommand{\foo}{<Unicode char corresponding to Chinese ai>} \end{utf8-encoding} Now I have to say something like \mandarin{\foo} or \japanese{\foo}. Of course, putting the language switch into the definition of \foo would be legal, too. The main restriction of this approach is that we cannot (easily) do something like \begin{mandarin} \section{<...>} \end{mandarin} \begin{japanese} \section{<...> \end{japanese} and expect that the language context is properly preserved in the TOC. a) Is it reasonable and necessary at all for this example to work, i.e. that a TOC or index should mix languages "automatically"? b) If the "japanese" in the second example would be "english", one could simply "stack" language context globally. I.e., below the primary language we can have an arbitrary number of working languages which only determine features which languages higher in the hierarchy have not explicitly defined (such as rendering of glyphs in certain Unicode regions). So only in cases when there are conflicting choices (japanese vs. mandarin, for example) we need local mark-up: \section{\japanese{<...>}} 3. Extreme version of 2 (the only strategy that seems to be cleanly implementable on current Omega): We simply define the \InputTranslation to be fixed on a per-file basis. In other words, we acknowledge that it does not make any sense in terms of usability to mix input encodings, as such files simply cannot (and should not) be displayed cleanly in any editor. So preparing multiencoded text must proceed along the following options: a) Split text into several files. (Useful for blocks of original source which is not subject to frequent modification.) b) Use UTF-8 and rely on the editor for encoding translation during import. (For example, the Emacs command insert-file-contents can do coding translation; we should also expect that drag-and-drop protocols of various windowing systems will eventually be able to do this properly). c) For legacy source, the functionality of current inputenc could be provided independent of the particular ICR. --Marcel