## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Classic View Use Monospaced Font Show Text Part by Default Show All Mail Headers Topic: [<< First] [< Prev] [Next >] [Last >>]

 Re: Multilingual Encodings Summary David Carlisle <[log in to unmask]> Tue, 13 Feb 2001 16:55:07 GMT text/plain (65 lines) > Every letter should be made active to look forward to find the combining > character sequence after it, and then puts that over its own head! I don't > think this is impossible, you need to loop until a non-combining char is > found. That's the easy bit. The hard bit is that having made every character active \begin no longer parses as the begin token but as \ b e g i n so you have to make the active definition of \ look ahead to grab all the "letters" where "letter" means those characters that were catcode 11 until you made them 13, so you have to maintain a list of all those, and check one by one with what's in the token stream. Similarly matching { } no longer works (unless you cheat and leave those catcode 1 and 2) so in the end you have to write TeX's tokeniser in TeX. Which is possible but not especially fast and hard to do without breaking some add-on latex package, somewhere. > With math yes, but with other things no, the model is getting stable. It's not just math. 40000 (I think) Chinese characters just got added. Unicode 2 was one plane of 2^16. Uniocde 3 is 17 planes of 2^16. that's a lot of new slots for people to suggest ways to fill, it will grow. > it because Unicode only uses code points less > than U+10FFFF, there is a lot of space if we want additional internal > glyphs. Going above 10FFFF might be dangerous (if you ever wanted a feature to output the internal state you'd have problems) but plane 13 and 14 are empty for private use, which is 2^17 spare slots, which ought to be enough. But I think the main problem is that it doesn't really make sense to use unicode internally in standard TeX (which is a 7bit system pretending to be 8bit). If latex switched to use omega (only) then a) this might require omega to be more stable than omega users would wish, ie it might prematurely limit addition of new features. b) it would cut out people using tex systems that don't include omega. You might say they should all switch to web2c tex, but that's like saying that everyone should use emacs on linux. Clearly it's true, but it doesn't happen that way. c) special case of (b) it would (at present, I think) cut out pdflatex. d) It would require reasonably major surgery to LaTeX internals. It would be possible to make documents and packages using "documented interfaces" still work with a new internal character handling, but ctan will reveal a lot of heavily used packages that for good (or bad) reasons don't use documented interfaces, but just redefine arbitrary macros. (Often because there isn't a documented interface). A lot of these would break. So in short to medium term it seems there have to be two versions latex/omega and latex/tex. How compatible they can be as latex/omega uses more omega features I am not sure. David _____________________________________________________________________ This message has been checked for all known viruses by Star Internet delivered through the MessageLabs Virus Control Centre. For further information visit http://www.star.net.uk/stats.asp