LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)
From:	"William F. Hammond" <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Sun, 11 Feb 2001 16:46:57 -0500
Content-Type:	text/plain
Parts/Attachments:	text/plain (38 lines)

Roozbeh Pournader <[log in to unmask]> writes:

> Also, many applications shipped with Windows 2000 attach a signature
> to the start of file (U+FEFF, Zero-Width No-Break Space) when they
> want to save the file, so that will make the autodetection much
> easier. The Unicode Standard accepts this as an autodetection
> mechanism, and says that this sequence (EF BB BF in UTF-8) is really
> improbable anywhere other than a UTF-8 file.

Such use of the byte sequence "EF BB BF" is a hack.  It has
probability $2^{-24}$ as the initial three byte sequence in a stream
of random bytes.  In many locales it is even printable and screen
representable, and who knows what it represents in someone else's
locale now or in the future.

> Although, I do not have a good experience with that, I don't like my
> HTML files becoming non-conformant according to Unix checkers I have.

Under the rules non-conforming XHTML (next generation HTML) is supposed
to be rejected by a conforming XML processor.  Non valid XHTML will have
a high probability of failure to convey correctly the author's intent.

The correct way to indicate utf-8 encoding is with something like

<?xml ... encoding="utf-8"?>

or in another context

Content-type: text/plain; charset="utf-8"

or (some day) \usepackage[utf8]{inputenc}

or ...

as appropriate in the context.

                                     -- Bill

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung