Subject: | |
From: | |
Reply To: | |
Date: | Sun, 11 Feb 2001 20:38:40 +0100 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
I asked the question:
> > wouldn't it be better if the internal LaTeX representation would be Unicode
> > in one or the other flavor?
Roozbeh replied:
> What about symbol fonts like TC? What about math characters that are
> unified in Unicode (\rightarrow and \longrightarrow)? What about the
> things that are not yet in Unicode?
yes, what about them?
as I outlined already in other replies I don't think that unicode or UTF8 is
the answer as far as LICR is concerned. it can only provide a partial answer
- it clearly can't provide the answer for chars not existing in unicode
- and it clearly can't provide the answer for math
however LICR (or the part I'm talking about) isn't really concerned with math
which needs a far richer, or lets say different handling anyway; and which
on the other hand doesn't need some of the mechanisms needed for text
representations, like being aware of certain type of font attribute changes
etc.
> > - however, not clear is that the resulting names are easier to read, eg
> > \unicode{00e4} viz \"a.
>
> They are worse than you may think. They are always hard to read. My real
> work is related to Unicode Arabic script, and after two years of full
> dedication, I can't recall more than a few codes. I always need a table at
> hand. I have much less experience with Knuthian names of math symbols, but
> I'm sure I can recall the names of more than 95% of them without any
> problem.
so you agree with me, they aren't easy to read :-) but then being "internal"
this only matters in some circumstances and Oliver put some good arguments
forward when something like UTF8 might actually be easier to read.
> > - with intermediate forms like data written to files this could be a pain and
> > people in Russia, for example, already have this problem when they see
> > something like \cyr\CYRA\cyrn\cyrn\cyro\cyrt\cyra\cyrc\cyri\cyrya. In case
> > of unicode as the internal representation this would be true for all
> > languages (except English) while currently the Latin based ones are still
> > basically okay.
>
> This is a place where UTF8 helps a lot. People can use Unicode text
> editors to see the files, or use the widely available convertors like
> iconv to convert to theoretically every charset.
yes and no, I tried to explain that there are limitations posed by the current
implementation of the major underlying formatter (ie TeX) which you can't
easily overcome and even if you do: which then needs a long time to get
actually being deployed at sites that have not much use for anything other
than ASCII plus perhaps a few accents.
> Unicode also has the equivalent of \", it only appears after the letter.
> So the problem of a accented letter not in Unicode is not a real problem,
> these letters can also be made in Unicode. But I don't know what are you
> going to do with the combining accent appearing after the letter.
ahh here is the remark i was searching for an hour ago:
nothing really and that is a problem as long as i want to stick with TeX and a
bit of its parsing machinery. and that means i can't make use of this concept.
frank
|
|
|