On Sun, 11 Feb 2001, Frank Mittelbach wrote: > no i mean at the system level. Linux system libraries (which comes to be glibc) has many functions and data types for Unicode support. In version 2.2, glibc has become Unicode-oriented in many ways. Support for UTF8 basic operations has been there for a long time before 2.2. > what do you mean by windows2000 autodetects > them? my understanding of what UTF8 means as a format is that you can't > autodetect it. As best you can detect that something is not UTF8, but how do > you want to detect it as being in that format and not in, say, a file written > with an 8bit inputencoding which happens to just contain an 8bit stream which > is by chance also conforming to the UTF8 spec? If it conforms by chance, you are really in bad luck. Although very simple to make such an example by hand, finding a non-UTF8 document that is conformant by accident, is almost impossible. Also, many applications shipped with Windows 2000 attach a signature to the start of file (U+FEFF, Zero-Width No-Break Space) when they want to save the file, so that will make the autodetection much easier. The Unicode Standard accepts this as an autodetection mechanism, and says that this sequence (EF BB BF in UTF-8) is really improbable anywhere other than a UTF-8 file. Although, I do not have a good experience with that, I don't like my HTML files becoming non-conformant according to Unix checkers I have. --roozbeh