LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: Re: Multilingual Encodings Summary From: David Carlisle <[log in to unmask]> Reply To: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Tue, 13 Feb 2001 16:55:07 GMT Content-Type: text/plain Parts/Attachments: text/plain (65 lines)
> Every letter should be made active to look forward to find the combining
> character sequence after it, and then puts that over its own head! I don't
> think this is impossible, you need to loop until a non-combining char is
> found.

That's the easy bit.

The hard bit is that having made every character active \begin no longer
parses as the begin token but as \ b e g i n so you have to make the
active definition of \ look ahead to grab all the "letters" where
"letter" means those characters that were catcode 11 until you made them
13, so you have to maintain a list of all those, and check one by one
with what's in the token stream. Similarly matching { } no longer works
(unless you cheat and leave those catcode 1 and 2) so in the end you
have to write TeX's tokeniser in TeX. Which is possible but not
especially fast and hard to do without breaking some add-on latex
package, somewhere.

> With math yes, but with other things no, the model is getting stable.
It's not just math. 40000 (I think) Chinese characters just got added.
Unicode 2 was one plane of 2^16. Uniocde 3 is 17 planes of 2^16.
that's a lot of new slots for people to suggest ways to fill, it will grow.

> it because Unicode only uses code points less
> than U+10FFFF, there is a lot of space if we want additional internal
> glyphs.

Going above 10FFFF might be dangerous (if you ever wanted a feature to
output the internal state you'd have problems) but plane 13 and 14 are
empty for private use, which is 2^17 spare slots, which ought to be
enough.

But I think the main problem is that it doesn't really make sense to
use unicode internally in standard TeX (which is a 7bit  system
pretending to be 8bit).

If latex switched to use omega (only) then
a) this might require omega to be more stable than omega users would
wish, ie it might prematurely limit addition of new features.
b) it would cut out people using tex systems that don't include omega.
You might say they should all switch to web2c tex, but that's like
saying that everyone should use emacs on linux. Clearly it's true, but
it doesn't happen that way.
c) special case of (b) it would (at present, I think) cut out pdflatex.
d) It would require reasonably major surgery to LaTeX internals. It
would be possible to make documents and packages using "documented
interfaces" still work with a new internal character handling, but
ctan will reveal a lot of heavily used packages that for good (or bad)
reasons don't use documented interfaces, but just redefine arbitrary
macros. (Often because there isn't a documented interface).
A lot of these would break.

So in short to medium term it seems there have to be two versions
latex/omega and latex/tex. How compatible they can be as latex/omega
uses more omega features I am not sure.

David

_____________________________________________________________________
This message has been checked for all known viruses by Star Internet delivered
through the MessageLabs Virus Control Centre. For further information visit
http://www.star.net.uk/stats.asp