LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

On Sun, 11 Feb 2001, Frank Mittelbach wrote:

> no i mean at the system level.

Linux system libraries (which comes to be glibc) has many functions and
data types for Unicode support. In version 2.2, glibc has become
Unicode-oriented in many ways. Support for UTF8 basic operations has been
there for a long time before 2.2.

> what do you mean by windows2000 autodetects
> them? my understanding of what UTF8 means as a format is that you can't
> autodetect it. As best you can detect that something is not UTF8, but how do
> you want to detect it as being in that format and not in, say, a file written
> with an 8bit inputencoding which happens to just contain an 8bit stream which
> is by chance also conforming to the UTF8 spec?

If it conforms by chance, you are really in bad luck. Although very
simple to make such an example by hand, finding a non-UTF8 document that
is conformant by accident, is almost impossible.

Also, many applications shipped with Windows 2000 attach a signature to
the start of file (U+FEFF, Zero-Width No-Break Space) when they want to
save the file, so that will make the autodetection much easier. The
Unicode Standard accepts this as an autodetection mechanism, and
says that this sequence (EF BB BF in UTF-8) is really improbable
anywhere other than a UTF-8 file. Although, I do not have a good
experience with that, I don't like my HTML files becoming non-conformant
according to Unix checkers I have.

--roozbeh