## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Classic View Use Monospaced Font Show Text Part by Default Condense Mail Headers Topic: [<< First] [< Prev] [Next >] [Last >>]

 Sender: Mailing list for the LaTeX3 project <[log in to unmask]> From: Lars Hellström <[log in to unmask]> Date: Mon, 21 May 2001 00:51:19 +0200 In-Reply-To: <[log in to unmask]> Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]> Parts/Attachments: text/plain (80 lines) At 11.25 +0200 2001-05-20, Marcel Oliver wrote: >Typesetting math the way it is >done in TeX _is_ visual mark-up, while (most of) the textual mark-up >in LaTeX is logical mark-up. Is there really a such qualitative difference between math and text today? In what way does e.g.    Euclid was a geometer. contain more logical mark-up than    a \in A ? Certainly math is visually more complex than text (at least in the Latin script; I'm not so sure about how a comparison with Arabic would turn out), and manual spacing corrections are more common, but you'll have to elaborate that idea before I buy it. >So a distinct MICR will not gain anything (and probably cause multiple >problems) unless we support full logical mark-up. However, this is >really a red herring. IMHO it will render LaTeX basically unusable >for tasks it currently excels in (communication between human (!) >mathematicians), and not add anything to areas where logical markup is >required (because LaTeX would not be able to use most of the >additional information anyway). You're thinking MathML-style typified markup here, right? I don't see where there is something like that in text today---there certainly isn't any LaTeX markup giving e.g. the analysis of a sentence (which is what that typified markup most resembles). >This leaves two issues: > >- Mapping Unicode into the current TeX (plus AMS-fonts etc.) naming > scheme, so that people will eventually be able to use a Unicode > enabled editor for their source files. Since people from the AMS > (and other math publishers?) have been working on the Unicode math > planes, I assume that this is essentially understood. > >- "Lost character conditions": If a font does not provide all > variations of a symbol that TeX or Unicode define, it should not > quietly resort to a many-to-one mapping, i.e., at least a warning > must be issued. This also seems fairly natural. For all variations of a symbol that (La)TeX defines I can agree, but I don't agree when it comes to every variation Unicode defines. (Cf. the idea of "dumb" typesetting systems below.) In fact, this is a special case of a more general matter: Should LaTeX necessarily respect all the (more or less) duplications of characters there are in Unicode, or should it be allowed to make identifications of characters? I propose the latter. My main reason for this is that LaTeX is a (comparatively) smart system which can know things about the context of the text it is typesetting and thereby conclude things like "It says XXX in the manuscript, but should most likely be YYY instead (because XXX is wrong here, but very similar to YYY, which is reasonable)." An example of this could be that if the input contains the character U+015F (LATIN SMALL LETTER S WITH CEDILLA) and the current language is Romanian, then that should probably be a U+0219 (LATIN SMALL LETTER S WITH COMMA BELOW) instead. If the language had been Turkish then it would have been the other way round. For dumb systems which do not have such information I can understand that Unicode must contain a couple of glyph variants in order to produce acceptable rendering of text. There are furthermore a some conditions which should be met before such identification is reasonable:  * It must take place on the input side of LaTeX. Omega currently has no    reasonable mechanism for this.  * Which identifications are made depend the current context; only those    reasonable should be made. If the language had instead been English in    the above example then there are no grounds to prefer either character,    so then we should simply follow the manuscript. Recall that it is not unusual to see much more drastic "identification rules" in books of rules for human typesetters. Lars Hellström