I asked the question: > > wouldn't it be better if the internal LaTeX representation would be Unicode > > in one or the other flavor? Roozbeh replied: > What about symbol fonts like TC? What about math characters that are > unified in Unicode (\rightarrow and \longrightarrow)? What about the > things that are not yet in Unicode? yes, what about them? as I outlined already in other replies I don't think that unicode or UTF8 is the answer as far as LICR is concerned. it can only provide a partial answer - it clearly can't provide the answer for chars not existing in unicode - and it clearly can't provide the answer for math however LICR (or the part I'm talking about) isn't really concerned with math which needs a far richer, or lets say different handling anyway; and which on the other hand doesn't need some of the mechanisms needed for text representations, like being aware of certain type of font attribute changes etc. > > - however, not clear is that the resulting names are easier to read, eg > > \unicode{00e4} viz \"a. > > They are worse than you may think. They are always hard to read. My real > work is related to Unicode Arabic script, and after two years of full > dedication, I can't recall more than a few codes. I always need a table at > hand. I have much less experience with Knuthian names of math symbols, but > I'm sure I can recall the names of more than 95% of them without any > problem. so you agree with me, they aren't easy to read :-) but then being "internal" this only matters in some circumstances and Oliver put some good arguments forward when something like UTF8 might actually be easier to read. > > - with intermediate forms like data written to files this could be a pain and > > people in Russia, for example, already have this problem when they see > > something like \cyr\CYRA\cyrn\cyrn\cyro\cyrt\cyra\cyrc\cyri\cyrya. In case > > of unicode as the internal representation this would be true for all > > languages (except English) while currently the Latin based ones are still > > basically okay. > > This is a place where UTF8 helps a lot. People can use Unicode text > editors to see the files, or use the widely available convertors like > iconv to convert to theoretically every charset. yes and no, I tried to explain that there are limitations posed by the current implementation of the major underlying formatter (ie TeX) which you can't easily overcome and even if you do: which then needs a long time to get actually being deployed at sites that have not much use for anything other than ASCII plus perhaps a few accents. > Unicode also has the equivalent of \", it only appears after the letter. > So the problem of a accented letter not in Unicode is not a real problem, > these letters can also be made in Unicode. But I don't know what are you > going to do with the combining accent appearing after the letter. ahh here is the remark i was searching for an hour ago: nothing really and that is a problem as long as i want to stick with TeX and a bit of its parsing machinery. and that means i can't make use of this concept. frank