## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show HTML Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: XML vs. (La)TeX markup (was: XML, UTF-8 and TeX engines) From: Joachim Schrod <[log in to unmask]> Reply To: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Fri, 18 Jul 2003 03:16:32 +0200 Content-Type: text/plain Parts/Attachments: text/plain (144 lines)
>>>>> "BV" == Boris Veytsman <[log in to unmask]> writes:

JS> I still like its markup syntax much more than
JS> XML for several reasons that are really off topic here.

BV> Are you sure about the off topic part? I certainly would like to
BV> know these reasons.

OK, you've got me. The real answer would be a paper on the inability
to do pure semantic markup in most situations, and about tag economy,
i.e., the relationship of amount of markup in a document compared to
the amount of text. It would also cover the necessity to have either
full support by editing environments, or be able to enter and maintain
markup manually in "standard" editors. Perhaps a presentation at a TeX
conference... :-)

But then, I'll try a shorter answer:

-- Good typesetting needs micromarkup'', things that TeX does with
"~", "\ ", "\,", etc. One can imagine semantic markup for many of
these items, but the amount of markup definitions and the
cognitive load to use the correct markup would be too high for
almost all authors. There are also issues where semantic markup
gets difficult, DEK's examples of ~ usage in the TeXbook provide
good examples for that. Overall, I like the economy of input here:
"~", "\,", or "--" are better to read and doesn't disturb the
input as much as &nbsp;, &spatium;, or &ndash;. Just imagine this
email written with XML entities... ;-) IMHO, the length of a tag
should be related to its importance: long tags for important
things, short tags for unimportant but necessary stuff.

-- Space handling in TeX is more "natural" than in XML. Not in
macros, mind you, but in document text. As an example, I like to
be able to use blank lines to separate paragraphs, as you can see
in this email. This is a markup tradition since decades, and it
has proven to be useful. As another example, I also like that
multiple blanks collapse to one; that drives me mad in Word.

-- I like the possibility to be able to introduce non-standard TeX
markup for special situations. E.g., in the TeX Directory
Standard, we used markup like

\begin{tdsSummary}
bibtex/           \BibTeX{} input files
. bib/            \BibTeX{} databases
. . base/         base distribution (e.g., \path|xampl.bib|)
. . misc/         single-file databases
. <package>/      name of a package
\end{tdsSummary}

In the document source, the directory structure is much easier to

<tdsSummary>
<entry>
<directory>bibtex</directory>
<description>\BibTeX{} input files</description>
</entry>
[...]
<entry>
<directory><subdir/><variable>package</variable></directory>
<description>name of a package</description>
</entry>
</tdsSummary>

In the current source, one spots errors immediately (e.g., how
many <subdir/>s). That would be lost in XML markup. Of course, I'm
biased since I designed the markup and wrote the macros. :-) SGML
provided DATATAG for that, but this was thrown out to make the
parser's writer life easier. Umpf, how many parser writers do we
have, compared with the number of authors?

-- TeX math markup is easier to write and to read then MathML.
Mathematicians can also use its flexibility to introduce arbitrary
new expressions in their "natural language math".

-- Editor support for (La)TeX source input is better than for XML.
Actually, this is a very difficile and difficult topic that would
need a paper in itself. Please note that this reflects my current
view on the state of available tools; there's nothing to prevent
anybody creating better XML editors -- they're promised since
years, but they don't arrive. Actually, there are good XML
document editors like Framemaker; but they're not as
platform-independent as I would need them. (For the record, I
tried many editors, and currently use psgml-mode in XEmacs. But
it's not as good as AUC-TeX.)

-- An often cited reason to use XML markup instead of TeX is the
better support for validation and transformation of XML documents.
But IMHO this is overemphasized, it is not needed as often as we
discuss it. Most XML documents that I've seen are not even
conformant to some schema, therefore one needs special transform
scripts for more document classes that one thinks at the start of
an XML project.

This is from my practical experience in introducing XML in
multinational large companies for mission-critical documents.
There it was even very hard to achieve agreement on structures for
formal documents like service level agreements -- the ad-hoc
markup that may be used for informal documents is good for
nothing. Hell, corporate users even don't use Word document styles
when they're available and prefer to klick on their bold and
italics button or change the type size directly. That's the

Of course, there are XML validators out there -- one only has to
fight with the inability to express completely sensible document
structures in DTDs or schemas. The resulting document structure
definitions are either very complex or very generic. Style sheets
for complex schemas are very hard to write, e.g., that's one of
the reasons why we don't have good support for high-quality
Docbook output. Validation of very generic structures doesn't
bring enough advantages, then valid documents are still nonsense.

Last, but not least: If markup validation is really so important,
one can and should spend effort to make a TeX validator available.
There are several TeX parser implementations out there -- I wrote
one myself in two weeks. (Btw, presented at the TUG conference in
Santa Barbara, years ago.) They can be utilized with sensible
effort.

-- Actually, IMO the main disadvantage of TeX markup is the shortage
of skillfull people in the job market to implement that markup.
That makes any manager worth his salary shy away from TeX. For me,
that's the main reason to use XML, I find more people with the
needed skills.

But it's late and I should stop here. I hope you got an impression of
my viewpoint. As I've written above, a full elaboration is beyond the
scope of this email discussion.

Cheers,
Joachim

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
How do we persuade new users that spreading fonts across the page