LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
David Carlisle <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Tue, 6 Feb 2001 16:31:23 GMT
Content-Type:
text/plain
Parts/Attachments:
text/plain (44 lines)
>  but not reasonable -- unless the
> processor, like David Carlisle's xmltex, is a TeX thing -- for it to
> know that a particular character must have \ensuremath applied.

That isn't clear. A unicode text processor is supposed to know an awful
lot about each character. It has to "know" that combing characters
combine, and is supposed to know the default writing direction of every
character, and various other properties. The property of being a math
character is really just one of these.   In fact it _is_ one of those
see
http://www.unicode.org/Public/UNIDATA/UnicodeData.html


Informative Categories

 Abbr.    Description
  Lm      Letter, Modifier
  Lo      Letter, Other
  Pc      Punctuation, Connector
  Pd      Punctuation, Dash
  Ps      Punctuation, Open
  Pe      Punctuation, Close
  Pi      Punctuation, Initial quote (may behave like Ps or Pe depending on usage)
  Pf      Punctuation, Final quote (may behave like Ps or Pe depending on usage)
  Po      Punctuation, Other
  Sm      Symbol, Math
^^^^^^^^^^^^^^^^^^^^^^
  Sc      Symbol, Currency
  Sk      Symbol, Modifier
  So      Symbol, Other


one of the problems xmltex has is that it _doesn't_ know this stuff
(and doesn't combine combing characters, for example)

Unicode as currently devised hasn't got 2^32 characters, just 17 planes
of 2^16, but even so, that's probably enough. But whether the internal
canonical form is a unicode number or a latex style 7bit string \'e
the issues of mapping between input encodings and this internal form,
and from there to font encodings, are probably about the same.


David

ATOM RSS1 RSS2