LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Classic View Use Monospaced Font Show Text Part by Default Condense Mail Headers
Topic:	[<< First] [< Prev] [Next >] [Last >>]

Sender: Mailing list for the LaTeX3 project <[log in to unmask]>

Subject: Re: Multilingual Encodings Summary 2.2

From: Lars Hellström <[log in to unmask]>

Date: Mon, 21 May 2001 00:51:19 +0200

In-Reply-To: <[log in to unmask]>

Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>

Parts/Attachments: text/plain (80 lines)

At 11.25 +0200 2001-05-20, Marcel Oliver wrote:
>Typesetting math the way it is
>done in TeX _is_ visual mark-up, while (most of) the textual mark-up
>in LaTeX is logical mark-up.

Is there really a such qualitative difference between math and text today?
In what way does e.g.

   Euclid was a geometer.

contain more logical mark-up than

   a \in A

?  Certainly math is visually more complex than text (at least in the Latin
script; I'm not so sure about how a comparison with Arabic would turn out),
and manual spacing corrections are more common, but you'll have to
elaborate that idea before I buy it.

>So a distinct MICR will not gain anything (and probably cause multiple
>problems) unless we support full logical mark-up.  However, this is
>really a red herring.  IMHO it will render LaTeX basically unusable
>for tasks it currently excels in (communication between human (!)
>mathematicians), and not add anything to areas where logical markup is
>required (because LaTeX would not be able to use most of the
>additional information anyway).

You're thinking MathML-style typified markup here, right? I don't see where
there is something like that in text today---there certainly isn't any
LaTeX markup giving e.g. the analysis of a sentence (which is what that
typified markup most resembles).

>This leaves two issues:
>
>- Mapping Unicode into the current TeX (plus AMS-fonts etc.) naming
>  scheme, so that people will eventually be able to use a Unicode
>  enabled editor for their source files.  Since people from the AMS
>  (and other math publishers?) have been working on the Unicode math
>  planes, I assume that this is essentially understood.
>
>- "Lost character conditions":  If a font does not provide all
>  variations of a symbol that TeX or Unicode define, it should not
>  quietly resort to a many-to-one mapping, i.e., at least a warning
>  must be issued.  This also seems fairly natural.

For all variations of a symbol that (La)TeX defines I can agree, but I
don't agree when it comes to every variation Unicode defines. (Cf. the idea
of "dumb" typesetting systems below.) In fact, this is a special case of a
more general matter: Should LaTeX necessarily respect all the (more or
less) duplications of characters there are in Unicode, or should it be
allowed to make identifications of characters? I propose the latter.

My main reason for this is that LaTeX is a (comparatively) smart system
which can know things about the context of the text it is typesetting and
thereby conclude things like "It says XXX in the manuscript, but should
most likely be YYY instead (because XXX is wrong here, but very similar to
YYY, which is reasonable)." An example of this could be that if the input
contains the character U+015F (LATIN SMALL LETTER S WITH CEDILLA) and the
current language is Romanian, then that should probably be a U+0219 (LATIN
SMALL LETTER S WITH COMMA BELOW) instead. If the language had been Turkish
then it would have been the other way round. For dumb systems which do not
have such information I can understand that Unicode must contain a couple
of glyph variants in order to produce acceptable rendering of text.

There are furthermore a some conditions which should be met before such
identification is reasonable:

 * It must take place on the input side of LaTeX. Omega currently has no
   reasonable mechanism for this.

 * Which identifications are made depend the current context; only those
   reasonable should be made. If the language had instead been English in
   the above example then there are no grounds to prefer either character,
   so then we should simply follow the manuscript.

Recall that it is not unusual to see much more drastic "identification
rules" in books of rules for human typesetters.

Lars Hellstr�m

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung