 Subject: Re: XML, UTF-8 and TeX engines From: William F Hammond <[log in to unmask]> Reply To: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Fri, 18 Jul 2003 14:01:31 -0400 Content-Type: text/plain Parts/Attachments: text/plain (50 lines)
Torsten Bronger <[log in to unmask]> writes:

> >>     XML doesn't do it and I find this very convenient.  In (La)TeX, for
> >
> > This statement about XML is not helpful without reference to a
> > particular document type.

So I guess you accept that point, but perhaps I did not need to say it

> > For example, gellmu article does provide elements called "math" and
> > "displaymath", which when formatted to LaTeX switch on the
> > corresponding math mode.
>
> Well, you cut off the citation too soon.  I am talking about
> character treatment only, and in
> -- or XML file format -- must provide a way to mark math areas, and
> it must apply rules or whatever to typeset accordingly."

Yes, sorry, I was reading too fast.

> > As LaTeX is evolving it will be possible for gellmu's "alpha" (an
> > empty element marked up in Gellmu source as \alpha) to be formatted in
> > LaTeX as (math) \alpha when recursively inside a math element and not
> > inside either of gellmu's "mbox" or "text", while outside of math
> > "alpha" could easily be morphed to a suitable unicode point.
>
> So you distuguish between both cases within your Gellmu tools?
> Okay, we have to, I do so, too; but actually I think that this is
> something that the typesetter should provide.  So, an \alpha in math
> mode should be cmmi, and in text mode is must be part of a Greek
> word.

One way or another there should be a distinction.

But I want gellmu article to be able to reach xhtml+mathml and for
this I want to have a source markup way of identifying math symbols.
For that purpose it is convenient for me to hold on to </alpha> (the
xml form of \alpha) until the end of any pipeline.  Beyond that I
think it inefficient use of xml structure to look individually at
items of cdata.  So my formatter is willing to think about how to
handle </alpha> but not about how to handle á (which will be
understood only as the unicode object that it is and which, therefore,
should not be found loose inside math).  (The last sentence is
supposed to have a single U+03B1 that is UTF-8 encoded; I don't know
what will happen in the mail.)

-- Bill