>>And >>regarding font transformation, they should be handled by fonts, but >>the main problem is that metric information (ie, tfm) cannot be >>modified from within TeX, except a few parameters; I really wonder >>if allowing more changes, mainly ligatures, is feasible (that >>solution would be better than font ocp's and vf's, I think). > > I don't understand this. What kind of font transformations are you > referring to? For example, removing the fi ligature in Turkish. Or using an alternate ortography in languages with contextual analysis. >>Semantically or visually? > > I suspect Frank considers meaning to be a semantic concept, not a visual. I also suspect that, but then if we pick a char it will be undefined visually and its rendering (and TeX is essentially about rendering) will need _always_ additional information about the context (example: traditional idiograms in Japanese vs. simplified ones in Chinese). > I believe one of the main problems for multilinguality in LaTeX today is > that there is no way of recording (or maybe even of determining) the > current context so that this information can be moved around with every > piece of code affected by it. Hence most current commands strive instead to > convert the code to a context-free representation (the LICR) by use of > protected expansion. In such a case, we must find the way. Without it, proper rendering is impossible. Of course, we may write an ideogram to the aux file as a macro; for example ai (love) can be written as \japaneseai and \chineseai depending on the context they are written, but that means that the resulting code is not very different from the current mess with Russian where we have \cyrA, \cyrB, etc. That's exactly what I want to avoid. Orf course, that also means that changing things depending on the target will become more difficult. Further, by doing so we are creating again a closed system using its own conventions with no links with external tools adapted to Unicode. I will be able to process a file and extract information from it with, say, Python very easily if they use a known representation (iso encodings or Unicode), but if we have to parse things like \japaneseai or similar, things become more difficult. I think it's a lot easier moving information with blocks of text and not with single chars. I don't understand why we cannot determine the current language context--either I'm missing something or I'm very optimistic about the capabilities of TeX. Please, could you give an example where the current language cannot be determined and/or moved? >>> But such characters (the Spanish as well as the Hebrew) aren't allowed in >>> names in LaTeX! >> >>But they should be allowed in the future in we want a true >>multilingual environment. > > Why? They are not part of any text, but part of the markup! Are you suggesting that Japaneses, Chineses, Tibetans, Arabs, Persians, Greeks, Russians, etc. must use the Latin alphabet *always*? That's not truly multilingual--maybe of interest for Occidental scholars, but not for people actually using these scripts and keyboards with these scripts. (Particularly messy is mixing right to left scripts with Latin.) > Isn't the \char primitive in Omega be able to produce arbitrary characters > (at least arbitrary characters in the basic multilingual plane)? Not exactly. The \char primitive is a char, but not intrinsically Unicode--ocp's are also applied to \char (and therefore they are transcoded). > It looks quite reasonable to me, and it is certainly much better than the > processing depicted in the example. Does this mean that the example should > rather be > > A B C D E > \'e \'e e^^^^0301 ^^^^00e9 ^^e9 As currently implemented, yes, it should. I'm not still sure if normalizing in this way is the best solution. However, I find the arguments in the Unicode book in favour of it quite convincing. Regards Javier