Hans Aberg <[log in to unmask]> writes:

> >...
> >I would also like to see somebody translate it to TEI and then compare
> >the HTML and LaTeX formattings obtained chez Rahtz from TEI with the
> >native GELLMU formattings.

Actually, I am more interested in getting a copy for Info trees than
in a TEI copy.  (And there is now an SGML version of Texinfo thanks to
Daniele Giacomini that formats to Texinfo.)  I guess I thought that
TEI fans might bite.  I also believe that a DocBook version would
prove useful inasmuch as Docbook is used by the Linux Documentation
Project.

> If you are in the need of various translations, have you tried using Flex
> (lexical analyzer generator) and Bison (parser generator, or
> compiler-compiler), see

Are you saying that it's easier to code translations from XML using
lex and yacc descendants rather than using standard XML tools such as
sgmlspl, jade, or xt?  I find that hard to believe.  (Of course, the
situation before 1996 was different.)

[snip]
> -- I use them together with C++, which is convenient as the latter has
> standard string classes.

Although I've written in C, I've never gotten into C++.  Are there
good regular expression libraries for C++?

> One approach is to parse objects into something like the DOM (Document
> Object Model, http://www.w3.org/), and then onto that hook a program that
> can translate into several different formats.

Of course, sgmlspl, jade, xt, and other standard sgml/xml tools
provide good frameworks for translating into as many different formats
as one likes by writing, respectively, Perl, DSSSL, and XSLT.
(Possibly also it would be viable to use David Carlisle's xmltex
followed by Eitan Gurari's tex4ht in which case one writes TeX.)

The power of sgmlspl (though not the speed) can match that of any
method except possibly when one wants to descend into CDATA segments.
But then if one finds one's self tempted^{1} to do that (as one might,
for example, in typesetting with TeX or LaTeX the name of TeX or LaTeX
or even the ASCII character '~' from an XML document type that does
not provide these things as empties^{2}), one should instead customize
one's XML document type.

                                    -- Bill

Notes:

1.  There is one reasonable situation where descent into CDATA
*should* take place: math mode contents need to be thoroughly parsed
in translation to MathML from a document type that mathematical
authors will find tolerable.  But there is no issue of that type in
connection with http://math.albany.edu:8010/glf/lfaq.xml although,
alas, one will find <tex/>, <latex/>, and <tld/>.  I wonder how some
of these things would survive a double translation

      gellmu/article ---(hypothetical)---> TEI ----> LaTeX .

2.  The default "article" document type for _regular_ GELLMU provides
three character names for each of the 33 non-alphanumeric but
printable ASCII characters.  Each of those is at risk for some
conceivable translation target.  But an author may simply use one of
these characters for itself when it is safe for both LaTeX and HTML.
And, for example, by default the syntactic translator understands
things like "\$" and "\{".  If the syntactic translator's new internal
verbatim (which becomes <verblist>, a list-like thing) is used (by
calling the front gellmu-verblist for gellmu-trans), then 32 of
of these 33 names are auto-generated (';' is omitted) from literal
verbatim.  Something almost identical happens to literal inline
material like |*~$\| if "manmac" mode is enabled .