> BOMs? Byte Order Mark. (which is mainly for UTF16 to distinguish between big and little endian flavours but Microsoft tools in particular tend to stick them on utf8 files as well). I don't think that anything special need be done for these since the BOM (if it isn't recognised as a BOM) will be recognised as ZERO WIDTH NO-BREAK SPACE (xFEFF) which means for a typesetting system there isn't really a lot that needs to be done. (except of course for the top level file where perhaps the utf8 will not be set up early enough, and typesetting even zero width characters before \documentclass doesn't work. More serious problems (which make me wonder if it's worth the effort of supporting utf8 in a standard TeX) are combining characters. In xmltex you can make these work by making every possible base character active and look ahead for a following combiner, but that is turned off by default as it's not exactly fast or robust. In LaTeX you can't do much other than make a combining accent generate an error as you can't really make the base ascii characters active if you are using the \abc style markup. It's easy to make a prepass with (say) perl to get rid of the combining characters and replace them by tex accent markup, but if you are doing that you can replace all of the utf8 (and utf16 as well) by traditional tex markup. this is slightly less portable but a whole lot more robust than doing it in TeX. The second thing that I have never really fixed in xmltex in this area is that the style of mapping the input character to an internal csname which you then map to a typesetting instruction is fine for supporting small European based character sets, but it soon gets to be pain if you are supporting large Asian character sets. CJK package's utf8 support has an option of mapping utf8 encoded input straight to a set of 8bit fonts encoded to map easily from utf8. This seems much more reasonable for supporting large Unicode fonts: Split them up as 8bit fonts so TeX can see them and trivially map to the right font/character from the utf8 sequences. I never got this working in xmltex though (as modifying anything in xmltex is a pain. It's not the most documented piece of code ever produced) David ________________________________________________________________________ This e-mail has been scanned for all viruses by Star Internet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________