Some random quick remarks. I'm trying to read the huge amount of messages Which is the purpose of the LICR? Apparently, it's only an intermediate step before creating the final output. That can be true in TeX, but not in Omega because the LICR can be processed by external tools (spelling, syntax, etc.) There are lots of tools using Unicode and very likely there will be more in a future. However, there are only a handful of tools understanding the current LICR and it's unlikely there will be more (they are eventually expanded and therefore cannot be processed anyway, the very fact that unicode chars are actual `letter' chars is critical). So, having true Unicode text (perhaps with tags, which can be removed if necessary) at some part of the internal processing is imo an essential feature in future extensions to TeX. And indeed Omega is an extension which can cope with that; I wouldn't like renounce that. Another aim of Omega is handling language typographical features without explicit markup. For instance: German "ck, Spanish "rr, Portuguese f{}i, Arabic ligatures, etc. Of course, vf can handle that, but must I create several hundreds of vf files only to remove the fi ligature? Omega tranlation processes can handle that very easily. [Marcel:] > > Anyway, Frank, I just got your last mail in my inbox (need to read the > > details more carefully), and I think we agree that it's worth > > exploring if there would be a substantial advantage for having some > > engine with Unicode internal reprentation. > [Frank:] > it surely is, though i'm not convinced that the time has come, given that the > current LICR actually is as powerful (or more powerful in fact) than unicode > ever can be. Please, could you explain why? [Roozbeh:] > > Please note that with different scripts, we have different font > > classifications also. I'm not sure if the NFSS model is suitable for > > scripts other than Latin, Cyrillic, and Greek (ok, there are some others > > here, like Armenian). > [Frank:] > i grant you that the way I developed the model was by looking at fonts and > their concepts available for languages close to Latin and so it is quite > likely that it is not suitable for scripts which are quite different. > > However to be able to sensibly argue this I beg you to give us some insight > about these classifications and why you think NFSS would be unable to model > them (or say not really suitable) I think that Roozbeh refers to the fact that the arabic script does not follow the occidental claasification of fonts (serif, sans serif, typewriter) The draft I've written for lambda will allow to say: \scriptproperties{latin}{rmfamily = ptmr, sffamily = phvr} \scriptproperties{greek}{rmfamily = grtimes, sffamily = grhelv} (names are invented) but as you can see, it still uses rm/sf/tt model. If I switch from latin to greek and the current font is sf (ie, phvr), then the greek text is written using grhelv, but which is the sf equivalent in Arabic script? Javier _________________________________________________________________ Javier Bezos | TeX y tipografia jbezos arroba wanadoo punto es | http://perso.wanadoo.es/jbezos/ PS. I would also apologize for discussing a set of macros which has not been made public yet, but remember it's only a draft and many thing are liable to change (and maybe the final code can be quite different. As we Spaniards say, perhaps "no lo reconocerá ni la madre que lo parió"). Anyway, I'm going to reproduce part of a small text I sent to the Omega list sometime ago. I would like to note that I didn't intend to move the discussion from the Omega-dev list to this one -- it just happened. ========== Let's now explain how TeX handle non ascii characters. TeX can read Unicode files, as xmltex demostrates, but non ascii chars cannot be represented internaly by TeX this way. Instead, it uses macros which are generated by inputenc, and which are expanded in turn into a true character (or a TeX macro) by fontenc: é --- inputenc --> \'{e} --- fontenc --> ^^e9 That's true even for cyrillyc, arabic, etc. characters! Omega can represent internally non ascii chars and hence actual chars are used instead of macros (with a few exceptions). Trivial as it can seem, this difference is in fact a HUGE difference. For example, the path followed by é will be: é --an encoding ocp-| |-- T1 font ocp--> ^^e9 +-> U+00E9 -+ \'e -fontenc (!)----| |- OT1 font ocp -> \OT1\'{e} It's interesting to note that fontenc is used as a sort of input method! (Very likely, a package with the same funcionality but with different name will be used.) For that to be accomplished using ocp's we must note that we can divide them into two groups: those generating Unicode from an arbitrary input, and those rendering the resulting Unicode using suitable (or maybe just available :-) ) fonts. The Unicode text may be so analyzed and transformed by external ocp's at the right place. Lambda further divides these two groups into four (to repeat, these proposals are liable to change): 1a) encoding: converts the source text to Unicode. 1b) input: set input conventions. Keyboards has a limited number of keys, and hands a limited number of fingers. The goal of this group is to provide an easy way to enter Unicode chars using the most basic keys of keyboards (which means ascii chars in latin ones). Examples could be: * --- => em-dash (a well known TeX input convention). * ij => U+0133 (in Dutch). * no => U+306E [the corresponding hiragana char] Now we have the Unicode (with TeX tags) memory representacion which has to be rendered: 2a) writing: contextual analysis, ligatures, spaced punctuation marks, and so on. 2b) font: conversion from Unicode to the local font encoding or the appropiate TeX macros (if the character is not available in the font). This scheme fits well in the Unicode Design Principles, which state that that Unicode deals with memory representation and not with text rendering or fonts (with is left to "appropiate standars"). Hence, most of so-called Unicode fonts cannot render properly text in many scripts because they lack the required glyphs. There are some additional processes to "shape" changes (case, script variants, etc.)