LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jörg Knappen <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Tue, 17 Mar 2009 08:55:18 +0100
Content-Type:
text/plain
Parts/Attachments:
text/plain (10 lines)
Unicode has a feature called BOM (Byte order mark) which is a single character prepended to
a Unicode file. Thus, if we know that the file is in some Unicode encoding, we can detect which
one it is by checking for the byte order mark. If there is no byte oreder mark,  assume ASCII or
some a non-Unicode 8 bit encoding. 

This way a Unicode aware can do (partly) an automatic detection of the encoding.

The problem is that checking for the byte order mark introduces an incompatible 
change to TeX, since, e.g., the UTF16 byte order mark looks like  

ATOM RSS1 RSS2