LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Sender:	Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:	Re: Multilingual Encodings Summary 2.2
From:	Lars Hellström <[log in to unmask]>
Date:	Mon, 14 May 2001 15:05:35 +0200
In-Reply-To:	<[log in to unmask]>
Reply-To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:	text/plain (64 lines)

At 12.18 +0200 2001-05-14, jbezos wrote:
>Lars wrote
>> >For example, removing the fi ligature in Turkish. Or using an alternate
>> >ortography in languages with contextual analysis.
>>
>> That doesn't seem like metric transformations to me, but more like
>
>Actually, they aren't, but for some reason Knuth
>very likely understands, this information is included
>in the tfm files (text font *metrics*).

Don't overestimate Knuth's foresight here. He needed some place to store
the font-related information TeX would need, so he simply packed all kinds
of information into a single file.

>>   There is some concern that unifying Han characters may lead to confusion
>>   because they are sometimes used differently by the various East Asian
>>   languages. Computationally, Han character unification presents no more
>>   difficulty than employing a single Latin character set that is used to
>>   write languages as different as English and French.
>>
>> If they are not different in Unicode then there probably is no reason to
>> make them different in LaTeX either.
>
>As far as Unicode is concerned, that's right because
>Unicode doesn't deal with glyphs at all; but when we
>have to select a glyph from a font we need some
>additional information. (And even Unicode 3.1
>provides tag chars  for protocols not using
>"text" tags like xml or LaTeX.)

If Unicode doesn't consider them to be distinct characters, then I see no
reason why LaTeX should. In general, we don't (as users of TeX or some
extension) select a glyph from a font, as the font is already a mapping
from characters to glyphs. If the user wants a specific rendering of a
character then he should choose a font where the character is rendered in
that way, not request that each font should provide all alternative
renderings.

You may want to compare with the situation in the latin script some
200--300 years ago. Some languages (e.g. French) were always set in
antiqua, whereas others (e.g. German) were always set in fraktur. Had
computers existed back then there would probably had been tag characters
for selecting antiqua or fraktur in the encodings used, but there would
have been little point in having distinct code points for the antiqua and
fraktur alphabets. Math is of course an exception, since the corresponding
letters aren't semantically equivalent in mathematical formulae.

>> Why should there exist characters which are not encoded using Unicode en
>> route from the mouth to the stomach, if we're anyway using Unicode for e.g.
>> hyphenation?
>
>Provided we are using Unicode for hyphenation.
>This is one of the main problems of TeX -- hyphenation
>depends on the font encoding (?).

You have to encode the hyphenation patterns somehow. As Unicode will cover
all known scripts it can be used as a universal encoding. Furthermore I
thought that there were OCPs (acting approximately at \shipout time) that
converted from Unicode to the actual font encodings when they are not the
same. Is this not correct?

Lars Hellstr�m

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung