LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Frank Mittelbach <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Wed, 21 May 2014 11:31:59 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (119 lines)
In my opinion the Unicode consortium has not screwed up (backspace 
backspace backspace ...) has not found the best possible for math and
there is no way to *properly* reconcile the two worlds.

Unicode started out as an attempt to codify plain text letters of all
languages. One of the most important axioms in that respect was the idea
that a "letter" is an abstract entity, e.g., Latin-small-a and that
different glyphs in fonts all represent that single entity "a"
regardless of shape or form it takes. So attributes like bold or
serif/sans etc are all outside the scope of Unicode encoding.

That makes sense if you try to convey textual meaning. This makes sense
as "word" has a meaning regardless of being in italics or bold or both.
(of course such attributes extend the semantics, e.g. bold may indicate
a heading or italic some emphasis but underlying that "word" still has a
meaning of its own (in a language).

The problem with math though is that symbols in math are traditionally
be not just defined by an abstracted shape, but the mathematical
community early one used additional attributes of glyphs to convey
semantics. So bold-lowercase-latin-letters may denote vectors and in one
formula a integral symbol and a bold-integral may have totally different
semantics. On top of it the semantics may change from field to field or
even from paper to paper (so other than calling it a bold-integral there
is not way to describe such symbols semantically).

The problem with this is that mathematicians have come up with using
effectively any kind of symbol/letter to denote specific semantics and
long ago started to use all kind of attributes (that unicode on the
level plain text regards as irrelevant) to indicate semantics too. The
main point here then is that the moment that happens the attributes
become frozen and symbols+attribute become relevant symbols in their own
right.

As a result to express the language of mathematics unicode would have
needed to codify all kind of letter/symbol+attribute(s) as individual
unicode points which is a difficult if not impossible task.

Nevertheless, they went for this approach to some extend by codifying
mathematical alphabets (mainly digits+a-z+A-Z plus some greek) and of
course a large number of symbols.

In the unicode book it says:

The alphabets in this block encode only semantic distinction, but not
which font will be used to supply the actual plain, script, Fraktur
[...] Characters from the Mathematical Alphanumeric Symbol block are not
to be used for nonmathematical styled text.

All mathematical alphanumeric symbols have compatibility decompositions
to the base Latin and Greek letters. This does not imply that the use of
these characters (I guess the base ones - Frank) is discouraged for
mathematical use. Folding away such distinctions [..] is usually not
desirable, however, as it loses the semantic distinction for which these
characters are encoded.

That is all true and sensible and to explicitly encode that something is
a math-caligraphic S and not just a Latin-S (that happens to be in some
caligraphic font) is desirable when passing data from one application to
the next as the font information is likely to be lost and thus the
semantics.

However, it is by no means offering a full codification of mathematical
semantics, so by the end of the day you may end up with a mixture of
"properly" encoded material + stuff that lost the semantic distinction.

the good part is that it covers a lot but it is not comprehensive by any
means and can't be due to the approach chosen.

It reminds me a bit of a talk I heard recently where somebody was
advocating to use sub-superscript unicode digits to avoid having to type
_2 or ^3 arguing that this is easier and nicer and better readable. Well
to me it isn't the moment you get to real math because then it gets
inconsistent and you end up with mixed syntax.

For the same reason believe that it would have been better to approach
math alphabets differently in unicode and instead of codifying a few
(with limited letter sets) acknowledge the fact that this "language" has
a meta level where symbol+attribute encode semantics and not just symbol
as such.

Anyway this is no here nor there as this is what unicode offers nowadays.

So where does it fail?

  - in case of attributed mathematical symbols, most prominently using
bold as offered by the bm package, resulting in new symbols as far as
the semantics are concerned

  - in case of multi-letter symbols (that require a fixed font (ie
frozen attributes) but with kerning for aesthetic reason)

  - in case of using alphabets which have not been considered (like two
distinctive calligraphic alphabets in parallel, or old german \neq
Fraktur (as my Algebra prof did) or cyrillic or ...

  - in the fact of not supporting diacritics for those alphabets (minor
case though)

LaTeX2e's math support codified most of the needs of the mathematics
language albeit only with its domain (that is within the LaTeX syntax),
i.e., it wasn't supporting any unicode code points for math (as they
didn't exist). So something like \mathbf was defining individual bold
math letters (for which unicode now has its own code point as long as
they are basic latin) but it was also offering this for word-like
symbols such as \mathbf{Set}

So if one now maps that to a full fledged text font that supports
kerning, you lose the code point semantic distinction outside LaTeX and
if you map it to the unicode plane then you have to manually deal with
kerning for multi-letter sequence (which is on-trivial and can't be
perfect) or live with horrible spacing.

Or you need to change the interface in LaTeX and offer different
commands or you change internals and distinguish between single letter
and multi-letter arguments. Or ...

frank

ATOM RSS1 RSS2