LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: Multilingual Encodings Summary
From:	Achim Blumensath <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Wed, 14 Feb 2001 09:41:51 +0100
Content-Type:	text/plain
Parts/Attachments:	text/plain (43 lines)

Hello,

On Tue, 13 Feb 2001, Frank Mittelbach wrote:
>  > - On all major platforms, support for editing and displaying UTF8
>  >   exists and either is currently moving into mass deployment.  Major
>  >   programming languages have UTF8 libraries, so the basic
>  >   infrastructure for UTF8 is or will be in place shortly.
>
> remains to be seen. in the long term most likely yes, but how many of the
> people on this list can easily (in their favorite editing system) edit or
> generate a utf8 encoded file? hands up?

The standard encoding of BeOS is UTF8. I don't know whether the number of
TeX-installations under BeOS exceeds, say, 100, though.

I don't think that Omega or NTS will replace TeX anytime soon, so here
are some rough ideas how to implement unicode support in TeX:

(a) Internally unicode characters can be encodes as command sequences of
the form \<some hex sequence>, i.e., `A' would become `\0041'.

(b) Each font would define these sequences appropriately, i.e,
`\def\0041{A}'. Characters not included in the font would raise an
error message.

(c) To convert the input file to the internal representation one could
write a preprocessor in TeX which is invoked by the \documentclass
command. That's IMHO the easiest way and I don't think the runtime penalty
would be that great. The preprocessor should leave command sequences and
braces alone, i.e., `\begin{bar}' would become `\begin{\0062\0061\0072}'.
The only problem I see with this approache are \catcode-changes.

Any thoughts?

Achim
--
________________________________________________________________________
                                                              | \_____/ |
   Achim Blumensath                                          \O/ \___/\ |
   Mathematische Grundlagen der Informatik                   =o=  \ /\ \|
   www-mgi.informatik.rwth-aachen.de/~blume                  /"\   o----|
____________________________________________________________________\___|

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung