LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: LaTeX's internal char prepresentation (UTF8 or Unicode?)
From:	Frank Mittelbach <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Sun, 11 Feb 2001 20:38:40 +0100
Content-Type:	text/plain
Parts/Attachments:	text/plain (68 lines)

I asked the question:

 > >  wouldn't it be better if the internal LaTeX representation would be Unicode
 > >  in one or the other flavor?

Roozbeh replied:

 > What about symbol fonts like TC? What about math characters that are
 > unified in Unicode (\rightarrow and \longrightarrow)? What about the
 > things that are not yet in Unicode?

yes, what about them?

as I outlined already in other replies I don't think that unicode or UTF8 is
the answer as far as LICR is concerned. it can only provide a partial answer

 - it clearly can't provide the answer for chars not existing in unicode
 - and it clearly can't provide the answer for math

however LICR (or the part I'm talking about) isn't really concerned with math
which needs a far richer, or lets say different handling anyway; and which
on the other hand doesn't need some of the mechanisms needed for text
representations, like being aware of  certain type of font attribute changes
etc.

 > >  - however, not clear is that the resulting names are easier to read, eg
 > >    \unicode{00e4} viz \"a.
 >
 > They are worse than you may think. They are always hard to read. My real
 > work is related to Unicode Arabic script, and after two years of full
 > dedication, I can't recall more than a few codes. I always need a table at
 > hand. I have much less experience with Knuthian names of math symbols, but
 > I'm sure I can recall the names of more than 95% of them without any
 > problem.

so you agree with me, they aren't easy to read :-) but then being "internal"
this only matters in some circumstances and Oliver put some good arguments
forward when something like UTF8 might actually be easier to read.

 > >  - with intermediate forms like data written to files this could be a pain and
 > >    people in Russia, for example, already have this problem when they see
 > >    something like \cyr\CYRA\cyrn\cyrn\cyro\cyrt\cyra\cyrc\cyri\cyrya.  In case
 > >    of unicode as the internal representation this would be true for all
 > >    languages (except English) while currently the Latin based ones are still
 > >    basically okay.
 >
 > This is a place where UTF8 helps a lot. People can use Unicode text
 > editors to see the files, or use the widely available convertors like
 > iconv to convert to theoretically every charset.

yes and no, I tried to explain that there are limitations posed by the current
implementation of the major underlying formatter (ie TeX) which you can't
easily overcome and even if you do: which then needs a long time to get
actually being deployed at sites that have not much use for anything other
than ASCII plus perhaps a few accents.

 > Unicode also has the equivalent of \", it only appears after the letter.
 > So the problem of a accented letter not in Unicode is not a real problem,
 > these letters can also be made in Unicode. But I don't know what are you
 > going to do with the combining accent appearing after the letter.

ahh here is the remark i was searching for an hour ago:

nothing really and that is a problem as long as i want to stick with TeX and a
bit of its parsing machinery. and that means i can't make use of this concept.

frank

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung