LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:
From:
"William F. Hammond" <[log in to unmask]>
Date:
Sun, 11 Feb 2001 16:46:57 -0500
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:
text/plain (38 lines)
Roozbeh Pournader <[log in to unmask]> writes:

> Also, many applications shipped with Windows 2000 attach a signature
> to the start of file (U+FEFF, Zero-Width No-Break Space) when they
> want to save the file, so that will make the autodetection much
> easier. The Unicode Standard accepts this as an autodetection
> mechanism, and says that this sequence (EF BB BF in UTF-8) is really
> improbable anywhere other than a UTF-8 file.

Such use of the byte sequence "EF BB BF" is a hack.  It has
probability $2^{-24}$ as the initial three byte sequence in a stream
of random bytes.  In many locales it is even printable and screen
representable, and who knows what it represents in someone else's
locale now or in the future.

> Although, I do not have a good experience with that, I don't like my
> HTML files becoming non-conformant according to Unix checkers I have.

Under the rules non-conforming XHTML (next generation HTML) is supposed
to be rejected by a conforming XML processor.  Non valid XHTML will have
a high probability of failure to convey correctly the author's intent.

The correct way to indicate utf-8 encoding is with something like

<?xml ... encoding="utf-8"?>

or in another context

Content-type: text/plain; charset="utf-8"

or (some day) \usepackage[utf8]{inputenc}

or ...

as appropriate in the context.

                                     -- Bill

ATOM RSS1 RSS2