LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: LaTeX3 8-bit only?
From:	Philipp Stephani <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Wed, 10 Feb 2010 22:18:54 +0100
Content-Type:	text/plain
Parts/Attachments:	text/plain (12 lines)

Am 10.02.2010 um 19:56 schrieb Joseph Wright:

> Hello Chris,
> 
>> Input is not the only place where character-like things appear in TeX; this is another way of saying what Lars said.  Character repertoires are distinct from encodings of characters and these are different again from the encodings used in external files.
>> 
>> So you need to know what character repertoires you are going to deal with internally in these various types of string, whether or not these are represeted by, for example, 7-bit LICRs.
> 
> I was thinking of input encodings, where my point was (supposed to) be that something like the inputenc "utf8" approach would be an approach I hope we can avoid as there are better solutions (in the form of engines which deal with the issue). (Of course, that leaves UTF-16 issues, but I'd hope that engine developments can help out).

Current implementation strategies for strings in development environments define one Unicode encoding scheme (UTF-16 in nearly all cases like Windows, Java, Python, Qt, .NET, COM, Cocoa, Carbon; a few technologies like Gnome and Emacs choose UTF-8 instead) that is used exclusively for internal processing, and define "strings" as sequences of UTF-16 or UTF-8 code units. LaTeX could do the same, depending on the engine: UTF-8 for pdfTeX, UTF-16 for XeTeX. Other possibilities (e.g. LICR or UTF-32) are probably either too complicated or not flexible enough.

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung