LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Sender:	Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:	Re: Multilingual Encodings Summary 2.2
From:	Javier Bezos <[log in to unmask]>
Date:	Sun, 13 May 2001 19:55:58 +0100
Reply-To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:	text/plain (88 lines)

>>And
>>regarding font transformation, they should be handled by fonts, but
>>the main problem is that metric information (ie, tfm) cannot be
>>modified from within TeX, except a few parameters; I really wonder
>>if allowing more changes, mainly ligatures, is feasible (that
>>solution would be better than font ocp's and vf's, I think).
>
> I don't understand this. What kind of font transformations are you
> referring to?

For example, removing the fi ligature in Turkish. Or using an alternate
ortography in languages with contextual analysis.

>>Semantically or visually?
>
> I suspect Frank considers meaning to be a semantic concept, not a visual.

I also suspect that, but then if we pick a char it will be
undefined visually and its rendering (and TeX is essentially about
rendering) will need _always_ additional information about the context
(example: traditional idiograms in Japanese vs. simplified ones in
Chinese).

> I believe one of the main problems for multilinguality in LaTeX today is
> that there is no way of recording (or maybe even of determining) the
> current context so that this information can be moved around with every
> piece of code affected by it. Hence most current commands strive instead to
> convert the code to a context-free representation (the LICR) by use of
> protected expansion.

In such a case, we must find the way. Without it, proper rendering is
impossible. Of course, we may write an ideogram to the aux file as
a macro; for example ai (love) can be written as \japaneseai and
\chineseai depending on the context they are written, but that means
that the resulting code is not very different from the current mess
with Russian where we have \cyrA, \cyrB, etc. That's exactly what I want
to avoid. Orf course, that also means that changing things depending
on the target will become more difficult.

Further, by doing so we are creating again a closed system
using its own conventions with no links with external tools adapted
to Unicode. I will be able to process a file and extract information
from it with, say, Python very easily if they use a known representation
(iso encodings or Unicode), but if we have to parse things like \japaneseai
or similar, things become more difficult.  I think it's a lot easier
moving information with blocks of text and not with single chars.

I don't understand why we cannot determine the current language
context--either I'm missing something or I'm very optimistic about
the capabilities of TeX. Please, could you give an example where
the current language cannot be determined and/or moved?

>>> But such characters (the Spanish as well as the Hebrew) aren't allowed in
>>> names in LaTeX!
>>
>>But they should be allowed in the future in we want a true
>>multilingual environment.
>
> Why? They are not part of any text, but part of the markup!

Are you suggesting that Japaneses, Chineses, Tibetans, Arabs,
Persians, Greeks, Russians, etc. must use the Latin alphabet *always*?
That's not truly multilingual--maybe of interest for Occidental
scholars, but not for people actually using these scripts and
keyboards with these scripts. (Particularly messy is mixing
right to left scripts with Latin.)

> Isn't the \char primitive in Omega be able to produce arbitrary characters
> (at least arbitrary characters in the basic multilingual plane)?

Not exactly. The \char primitive is a char, but not intrinsically
Unicode--ocp's are also applied to \char (and therefore they are
transcoded).

> It looks quite reasonable to me, and it is certainly much better than the
> processing depicted in the example. Does this mean that the example should
> rather be
>
>     A     B        C          D        E
>    \'e   \'e   e^^^^0301   ^^^^00e9   ^^e9

As currently implemented, yes, it should. I'm not still sure if normalizing
in this way is the best solution. However, I find the arguments in the
Unicode book in favour of it quite convincing.

Regards
Javier

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung