LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Proportional Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Martin Hensel <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Mon, 7 Jul 2003 16:31:16 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (51 lines)
> These paragraphs made quite clear that the author didn't know a
> thing about TeX constraints (and is erroneous about space handling
> in HTML and XML as well). Obviously somebody who is new to
> technical details of existing markup languages.
>
> So the probability to find something worthwile in the rest of the
> text was not high enough to spend the time reading further.

Could you please explain to me, where I'm wrong with HTML and XML?

I wrote:
,-----[ syntax.pdf ]-----
| In languages like HTML, XML, and most programming languages spaces
| are treated as following: Line breaks are considered as spaces,
| two or more spaces are considered as a single space.
`-----

The HTML specification
,-----[ http://www.w3.org/TR/html401/struct/text.html ]-----
| only the following characters are defined as white space
| characters:
| - ASCII space (&#x0020;)
| - ASCII tab (&#x0009;)
| - ASCII form feed (&#x000C;)
| - Zero-width space (&#x200B;)
| Line breaks are also white space characters.
: ...
| For all HTML elements except PRE, sequences of white space
| separate "words" (we use the term "word" here to mean "sequences
| of non-white space characters"). When formatting text, user agents
| should identify these words and lay them out according to the
| conventions of the particular written language (script) and target
| medium.
: ...
| For example, in Latin scripts, inter-word space is typically
| rendered as an ASCII space (&#x0020;),
`-----

The XML specification
,-----[ http://www.w3.org/TR/REC-xml ]-----
| S (white space) consists of one or more space (#x20) characters, |
carriage returns, line feeds, or tabs.
|
| White Space
|    [3]    S    ::=    (#x20 | #x9 | #xD | #xA)+
`-----



Martin

ATOM RSS1 RSS2