At 14:28 +0100 2001/02/13, Marcel Oliver wrote: >2. Internal Representation and Output Encoding: > >2.1. Problems with Current TeX: ... >This leads to a number of problems. > >- A sufficiently general internal multilingual representation may be > impossible to maintain, unless it is Unicode in disguise. If we are speaking about tweaking TeX's internals, what is needed is a stream of characters, where the characters can be subjected to various operations, such as comparisons, etc. The exact internal representation is irrelevant. If the implementation uses say C++, one could easily implement such characters which polymorphicly can change internal representation. It could then be mixture of 1-4 byte formats. However, if one would decide to implement such polymorphism by allocating each character in separately in free store, it would be slow, and each character would take up at typically 1 (computer-)word to indicate the size of the allocation, and the character itself plus word round-off, which is another word, that is 2 words, or at least 8 bytes for each characters. And the latest Mac's with G3 & G4 uses 64 and 128 bit words. So this suggests that what one should use, for the internal representation, are 32-bit characters, which are encoded in some way making each character in the semantic sense unique. (That is, if a group of input characters are to be regarded as a single semantic entity, they should be replaced with a unique 32-bit code.) -- Space will be enough with today's computers, and using more compact formats will not be faster, as the CPU's internals will probably compute in larger words anyway. (That is, if one uses 16-bit characters, they will probably first be translated into 32-bit words or larger, the CPU operations will then be performed, and after that, translated back into 16 bit characters. It will just as fast working with 32-bit characters directly, or perhaps even faster if it decreases the need of making round-offs. Strictly speaking, which one is the faster can only be determined by using a profiler, but 32-bit characters seems to be OK.) Hans Aberg