LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

Am 10.02.2010 um 19:56 schrieb Joseph Wright:

> Hello Chris,
> 
>> Input is not the only place where character-like things appear in TeX; this is another way of saying what Lars said.  Character repertoires are distinct from encodings of characters and these are different again from the encodings used in external files.
>> 
>> So you need to know what character repertoires you are going to deal with internally in these various types of string, whether or not these are represeted by, for example, 7-bit LICRs.
> 
> I was thinking of input encodings, where my point was (supposed to) be that something like the inputenc "utf8" approach would be an approach I hope we can avoid as there are better solutions (in the form of engines which deal with the issue). (Of course, that leaves UTF-16 issues, but I'd hope that engine developments can help out).

Current implementation strategies for strings in development environments define one Unicode encoding scheme (UTF-16 in nearly all cases like Windows, Java, Python, Qt, .NET, COM, Cocoa, Carbon; a few technologies like Gnome and Emacs choose UTF-8 instead) that is used exclusively for internal processing, and define "strings" as sequences of UTF-16 or UTF-8 code units. LaTeX could do the same, depending on the engine: UTF-8 for pdfTeX, UTF-16 for XeTeX. Other possibilities (e.g. LICR or UTF-32) are probably either too complicated or not flexible enough.