LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Sender:	Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:	Re: Multilingual Encodings Summary 2.2
From:	jbezos <[log in to unmask]>
Date:	Mon, 14 May 2001 12:18:23 +0200
Reply-To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:	text/plain (57 lines)

Lars wrote
> >For example, removing the fi ligature in Turkish. Or using an alternate
> >ortography in languages with contextual analysis.
>
> That doesn't seem like metric transformations to me, but more like

Actually, they aren't, but for some reason Knuth
very likely understands, this information is included
in the tfm files (text font *metrics*).

>   There is some concern that unifying Han characters may lead to confusion
>   because they are sometimes used differently by the various East Asian
>   languages. Computationally, Han character unification presents no more
>   difficulty than employing a single Latin character set that is used to
>   write languages as different as English and French.
>
> If they are not different in Unicode then there probably is no reason to
> make them different in LaTeX either.

As far as Unicode is concerned, that's right because
Unicode doesn't deal with glyphs at all; but when we
have to select a glyph from a font we need some
additional information. (And even Unicode 3.1
provides tag chars  for protocols not using
"text" tags like xml or LaTeX.)

, by doing so we are creating again a closed system
> >using its own conventions with no links with external tools adapted
> >to Unicode. I will be able to process a file and extract information

> Depends on what type of information it is. For information specifying the
> language almost certainly yes. If you want to move around information
> saying "the 8-bit characters in this piece of text should be interpreted
> according to the following input encoding" then I would say no (amongst
> other things because it would constitute a representation not known to
> other programs).

And I myself gave a good argument in favour of
that!

> Why should there exist characters which are not encoded using Unicode en
> route from the mouth to the stomach, if we're anyway using Unicode for e.g.
> hyphenation?

Provided we are using Unicode for hyphenation.
This is one of the main problems of TeX -- hyphenation
depends on the font encoding (?).

> Exactly in what way normalization should be applied and when clearly needs
> further study.

Agreed.

Javier
_____________________________________________________________________
Conoce la que ser� la pel�cula del verano y ll�vate una camiseta de cine en http://www.marujasasesinas.com/html/concurso.html

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung