 Re: XML, UTF-8 and TeX engines Torsten Bronger <[log in to unmask]> Fri, 18 Jul 2003 20:52:09 +0200 text/plain (61 lines) Halloechen! William F Hammond <[log in to unmask]> writes: > Torsten Bronger <[log in to unmask]> writes: > > [...] > >>> As LaTeX is evolving it will be possible for gellmu's "alpha" >>> (an empty element marked up in Gellmu source as \alpha) to be >>> formatted in LaTeX as (math) \alpha when recursively inside a >>> math element and not inside either of gellmu's "mbox" or "text", >>> while outside of math "alpha" could easily be morphed to a >>> suitable unicode point. >> >> So you distuguish between both cases within your Gellmu tools? >> Okay, we have to, I do so, too; but actually I think that this is >> something that the typesetter should provide. So, an \alpha in >> math mode should be cmmi, and in text mode is must be part of a >> Greek word. > > One way or another there should be a distinction. > > But I want gellmu article to be able to reach xhtml+mathml and for > this I want to have a source markup way of identifying math > symbols. Granted, but eventually it's MathML and then a following processor must cope with a Unicode alpha. And either it's something like my Unicode --> LaTeX filter program, or it's the typesetter itself. I prefer the latter strongly, because all other variants I've seen so far looked like kludges more or less. > For that purpose it is convenient for me to hold on to > (the xml form of \alpha) until the end of any pipeline. Beyond > that I think it inefficient use of xml structure to look > individually at items of cdata. I think so, too, however db2latex and the MathML-->XSLT-->LaTeX project (sorry, don't know its Sourceforge name at the moment) do something like that apparently. > So my formatter is willing to think about how to handle > but not about how to handle á (which will be understood only as > the unicode object that it is and which, therefore, should not be > found loose inside math). But then your formatter stops when having reached XML, or it starts with a format that has similar limitations as LaTeX. > (The last sentence is supposed to have a single U+03B1 that is > UTF-8 encoded; I don't know what will happen in the mail.) It arrived in one piece (but not as UTF-8). Tschoe, Torsten. -- Torsten Bronger, aquisgrana, europa vetus