LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Multilingual Encodings Summary
From:	Roozbeh Pournader <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Tue, 13 Feb 2001 19:51:03 +0330
Content-Type:	text/plain
Parts/Attachments:	text/plain (104 lines)

I'm sorry I can't correct your term. My English is bad enough to generate
ambiguities. I will only comment:

On Tue, 13 Feb 2001, Marcel Oliver wrote:

[Regarding UTF8]
> - Diagnostic messages could (although not with current TeX engine) be
>   output in the correct script.

Really? Not with current TeX engine? Why?

> - The "combining characters" of Unicode are difficult to handle with a
>   TeX based parser.  (Does "difficult" mean "impossible to get
>   right"???  What are the issues???)

Every letter should be made active to look forward to find the combining
character sequence after it, and then puts that over its own head! I don't
think this is impossible, you need to loop until a non-combining char is
found.

> - The output encoding is limited to 8 bit fonts, which may not be
>   enough to get correct kerning for some languages. (Can someone
>   confirm or correct this???)

We need some examples. I can't find any.

> - How are combining characters handled, in particular when they
>   represent a glyph that has also its own Unicode slot?  The main
>   issue is hyphenation.  How do Unicode capable word processors handle
>   this?

The term is "normalization". Unicode lists some sequences to be exactly
equivalent. For example, a" can be coded as <A>+<COMBINING UMLAUT> or
<A-UMLAUT>. Software should behave exactly the same when encountering
these two. They are recommended to use a normalization from which is more
appropriate for their usage. Having not played with them enough, I can't
suggest which of them is appropriate for LaTeX. The reference is:

        http://www.unicode.org/unicode/reports/tr15

> - Unicode is still changing, especially with respect to math
>   characters.  Does this prevent us from getting correct basic
>   infrastructure in place?

With math yes, but with other things no, the model is getting stable.

We will need comments from David.

> - Requirements for non-European scripts that have not been adequately
>   addressed?

I don't believe it to be the case. They are sometimes even more addressed
than needed. The only nag that often arises:

- Derivate scripts people, like Japanese, Persian, Yiddish, etc, nag that
they can't have both the original script and the derivate in a
plain text file. They want language-switching mechanisms for "plain text".

This is not a problem with LaTeX, so LaTeX has higher markup mechanisms.

> As explained above, the TeX engine has limited capabilities for
> multilingual typesetting and requires some rather awkward workarounds
> for non-English languages.  Omega with its internal Unicode
> representation is certainly an alternative.  What is the current state
> of Omega, what are potential problems, and are there other
> possibilities?

The main problem of Omega, seems to be feature-unstability. But with both
Unicode and the i18n model stabilizing in the outer world, Omega is also
stabilizing.

One of the good things with Oemga (I hope), will be it's new development
model. It may get really feature-rich.

> - It appears that Omega uses a 16 bit internal representation.  Is
>   this a restriction that may cause problems later, when someone finds
>   needed glyphs are outside the 16 bit range?

Omega has already changed its internal representation to 32-bits. So it
will be able to use Unicode's "supplementary characters". This will solve
the glyph code problem, and it because Unicode only uses code points less
than U+10FFFF, there is a lot of space if we want additional internal
glyphs.

> - Would the decision to move beyond TeX cause a feature explosion in
>   the engine that would be difficult to control?  On the other hand,
>   are there feature in e-TEX, NTS and friends that are deemed
>   essential or highly desirable, but are not provided by Omega?

Omega has disbanded supporing TeX--XeT primitives: \beginL, \endL,
\beginR, \endR. They are available in e-TeX, and there exists current
software that uses them. Omega replaces that with a stronger mechanism
though, and I think that those can be redefined as macros in terms of the
new primitives.

> - Judging from past release schedules, LaTeX will receive a major
>   upgrade about once every 10 years.  So if we wait until 2014 to get
>   state-of-the art international support, we may lose a lot of
>   potential users.

We're losing them right now.

--roozbeh

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung