Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)

Sun, 11 Feb 2001 20:38:40 +0100

 I asked the question:  > > wouldn't it be better if the internal LaTeX representation would be Unicode  > > in one or the other flavor? Roozbeh replied:  > What about symbol fonts like TC? What about math characters that are  > unified in Unicode (\rightarrow and \longrightarrow)? What about the  > things that are not yet in Unicode? yes, what about them? as I outlined already in other replies I don't think that unicode or UTF8 is the answer as far as LICR is concerned. it can only provide a partial answer  - it clearly can't provide the answer for chars not existing in unicode  - and it clearly can't provide the answer for math however LICR (or the part I'm talking about) isn't really concerned with math which needs a far richer, or lets say different handling anyway; and which on the other hand doesn't need some of the mechanisms needed for text representations, like being aware of certain type of font attribute changes etc.  > > - however, not clear is that the resulting names are easier to read, eg  > > \unicode{00e4} viz \"a.  >  > They are worse than you may think. They are always hard to read. My real  > work is related to Unicode Arabic script, and after two years of full  > dedication, I can't recall more than a few codes. I always need a table at  > hand. I have much less experience with Knuthian names of math symbols, but  > I'm sure I can recall the names of more than 95% of them without any  > problem. so you agree with me, they aren't easy to read :-) but then being "internal" this only matters in some circumstances and Oliver put some good arguments forward when something like UTF8 might actually be easier to read.  > > - with intermediate forms like data written to files this could be a pain and  > > people in Russia, for example, already have this problem when they see  > > something like \cyr\CYRA\cyrn\cyrn\cyro\cyrt\cyra\cyrc\cyri\cyrya. In case  > > of unicode as the internal representation this would be true for all  > > languages (except English) while currently the Latin based ones are still  > > basically okay.  >  > This is a place where UTF8 helps a lot. People can use Unicode text  > editors to see the files, or use the widely available convertors like  > iconv to convert to theoretically every charset. yes and no, I tried to explain that there are limitations posed by the current implementation of the major underlying formatter (ie TeX) which you can't easily overcome and even if you do: which then needs a long time to get actually being deployed at sites that have not much use for anything other than ASCII plus perhaps a few accents.  > Unicode also has the equivalent of \", it only appears after the letter.  > So the problem of a accented letter not in Unicode is not a real problem,  > these letters can also be made in Unicode. But I don't know what are you  > going to do with the combining accent appearing after the letter. ahh here is the remark i was searching for an hour ago: nothing really and that is a problem as long as i want to stick with TeX and a bit of its parsing machinery. and that means i can't make use of this concept. frank