Roozbeh Pournader <[log in to unmask]> writes: > Also, many applications shipped with Windows 2000 attach a signature > to the start of file (U+FEFF, Zero-Width No-Break Space) when they > want to save the file, so that will make the autodetection much > easier. The Unicode Standard accepts this as an autodetection > mechanism, and says that this sequence (EF BB BF in UTF-8) is really > improbable anywhere other than a UTF-8 file. Such use of the byte sequence "EF BB BF" is a hack. It has probability $2^{-24}$ as the initial three byte sequence in a stream of random bytes. In many locales it is even printable and screen representable, and who knows what it represents in someone else's locale now or in the future. > Although, I do not have a good experience with that, I don't like my > HTML files becoming non-conformant according to Unix checkers I have. Under the rules non-conforming XHTML (next generation HTML) is supposed to be rejected by a conforming XML processor. Non valid XHTML will have a high probability of failure to convey correctly the author's intent. The correct way to indicate utf-8 encoding is with something like <?xml ... encoding="utf-8"?> or in another context Content-type: text/plain; charset="utf-8" or (some day) \usepackage[utf8]{inputenc} or ... as appropriate in the context. -- Bill