LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mime-Version: 1.0 (Apple Message framework v1077)
Content-Type: text/plain; charset=us-ascii
Date: Wed, 10 Feb 2010 22:18:54 +0100
Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>
From: Philipp Stephani <[log in to unmask]>
Message-ID: <[log in to unmask]>
In-Reply-To: <[log in to unmask]>
Content-Transfer-Encoding: 8bit
Sender: Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments: text/plain (12 lines)
Am 10.02.2010 um 19:56 schrieb Joseph Wright:

> Hello Chris,
> 
>> Input is not the only place where character-like things appear in TeX; this is another way of saying what Lars said.  Character repertoires are distinct from encodings of characters and these are different again from the encodings used in external files.
>> 
>> So you need to know what character repertoires you are going to deal with internally in these various types of string, whether or not these are represeted by, for example, 7-bit LICRs.
> 
> I was thinking of input encodings, where my point was (supposed to) be that something like the inputenc "utf8" approach would be an approach I hope we can avoid as there are better solutions (in the form of engines which deal with the issue). (Of course, that leaves UTF-16 issues, but I'd hope that engine developments can help out).

Current implementation strategies for strings in development environments define one Unicode encoding scheme (UTF-16 in nearly all cases like Windows, Java, Python, Qt, .NET, COM, Cocoa, Carbon; a few technologies like Gnome and Emacs choose UTF-8 instead) that is used exclusively for internal processing, and define "strings" as sequences of UTF-16 or UTF-8 code units. LaTeX could do the same, depending on the engine: UTF-8 for pdfTeX, UTF-16 for XeTeX. Other possibilities (e.g. LICR or UTF-32) are probably either too complicated or not flexible enough.

ATOM RSS1 RSS2