## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

#### View:

 Message: [ First | Previous | Next | Last ] By Topic: [ First | Previous | Next | Last ] By Author: [ First | Previous | Next | Last ] Font: Proportional Font

Subject:

Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)

From:

Date:

Mon, 12 Feb 2001 10:25:21 GMT

Content-Type:

text/plain

Parts/Attachments:

 text/plain (52 lines)
 > > What about symbol fonts like TC? What about math characters that are > > unified in Unicode (\rightarrow and \longrightarrow)? What about the > > things that are not yet in Unicode? > yes, what about them? It may be worth noting that unicode 3.1 and 3.2 will (assuming the current plans go through) have a lot more (~1500 more, If I recall correctly) math characters than unicode 3.0. Actually one of the main things missing (currently) are long arrows. We (MathML working group) are in touch with the Unicode folks to see if there's any chance of those being added as well, although time is getting short for further additions to 3.1 and 3.2) There are always the private use areas of course for extra characters that a TeX/Unicode system could use. (Although private use characters for a publicly distributed system is considered bad form, sometimes it can't be avoided) Having built a TeX (rather than omega) based system (xmltex) that does use utf8 as the internal form of all characters I'd agree with the comments made earlier that one of the hardest problems are unicode combining characters. (xmltex doesn't deal with them at all by default). xmltex of course makes almost no use of TeX's inbuilt csname parsing for document files, as it reads xml syntax. It can run with all characters being active (which would be needed to handle combining characters) but normally ascii characters are non active which is a big time saving for languages that are mainly latin alphabet. In xmltex one is more or less forced to use utf8 as you are accepting XML character streams with no explicit markup, however for a system using TeX style markup it isn't at all clear that the benefits would outweigh the costs. Changing to a utf8 internal form would make latex slower (a lot slower if it handled combining characters) and for the majority of existing users it would have no advantage (so they would use the old system, and not update). For specific uses (in particular typesetting sources derived from XML) it is possible to layer utf8 support over the current base. There the costs are worth it (as there is really no alternative to UTF8 support, either you code the utf8 support in TeX as in xmltex (or similar code in cjk package and I think there's a utf8.sty on ctan) or you have an external program do the translation. If the TeX engine changes to Omega (or an Omega like system that is similarly based on unicode) then the rules change completely, and unicode as the internal form becomes a lot more attractive prospect. However I'm not sure that we are quite ready to switch all TeX distributions to Omega as the default engine for LaTeX are we? David