LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Martin Hensel <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Mon, 7 Jul 2003 16:31:16 +0100
text/plain (51 lines)
> These paragraphs made quite clear that the author didn't know a
> thing about TeX constraints (and is erroneous about space handling
> in HTML and XML as well). Obviously somebody who is new to
> technical details of existing markup languages.
> So the probability to find something worthwile in the rest of the
> text was not high enough to spend the time reading further.

Could you please explain to me, where I'm wrong with HTML and XML?

I wrote:
,-----[ syntax.pdf ]-----
| In languages like HTML, XML, and most programming languages spaces
| are treated as following: Line breaks are considered as spaces,
| two or more spaces are considered as a single space.

The HTML specification
,-----[ ]-----
| only the following characters are defined as white space
| characters:
| - ASCII space (&#x0020;)
| - ASCII tab (&#x0009;)
| - ASCII form feed (&#x000C;)
| - Zero-width space (&#x200B;)
| Line breaks are also white space characters.
: ...
| For all HTML elements except PRE, sequences of white space
| separate "words" (we use the term "word" here to mean "sequences
| of non-white space characters"). When formatting text, user agents
| should identify these words and lay them out according to the
| conventions of the particular written language (script) and target
| medium.
: ...
| For example, in Latin scripts, inter-word space is typically
| rendered as an ASCII space (&#x0020;),

The XML specification
,-----[ ]-----
| S (white space) consists of one or more space (#x20) characters, |
carriage returns, line feeds, or tabs.
| White Space
|    [3]    S    ::=    (#x20 | #x9 | #xD | #xA)+