LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Classic View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Topic:	[<< First] [< Prev] [Next >] [Last >>]

Re: LaTeX's internal char representation (UTF8 or Unicode?)

Lars Hellström <[log in to unmask]>

Tue, 13 Feb 2001 11:37:29 +0100

text/plain (61 lines)

At 21.58 +0100 2001-02-12, Chris Rowley wrote:
>I can now ask the following questions:
>
>Do the designers of Omega think that it needs or has a TRM?
>
>Do the designers of LaTeX-for-Omega think that it needs a TRM?

Despite being neither, I would like to state that I think something of that
kind will be very useful (and probably necessary). My reasons for this is
my experience with the "harmless character strings" I implemented in the
xdoc package (see CTAN:macros/latex/exptl/xdoc/) as developing some sort of
reasonable data type for text strings made it much easier to pass them
around and do things to them (such as create useful sort keys for indices).
It needs to be stressed though that the harmless character string are
something quite different from the TRM Chris writes about, as I try to
describe what some piece of code was "before TeX saw (tokenized) it",
whereas the TRM seems to be what it is well inside TeX.

Before the above, Crise wrote:
>This is a thing that enables a computer-based system for processing
>`text' to represent `text things' so that it can, easily and
>independently, do at least the following (not formal definitions):
>
>-- apply transformations to `text strings';

xdoc does some things of this kind, although probably not very relevant to
the current context. Perhaps some existing Omega applications provide
better examples?

>-- reason about `text strings';
>
>-- construct more concrete representations of `text strings' as
>   `relatively positioned unrendered graphical objects';
>
>-- reason about such representations of text strings.

Could you please clearify these last two items? What properties would these
things havel, would they have e.g. width? Or is it the kind of thing which
becomes trivial in latin and similar scripts?

>A TRM is none of the following (although for efficiency of
>implementation it may well be closely related to them):
>
>-- a coding for `text files' (such as utf8 or ASCII);
>
>-- an encoding for strings of unrendered glyphs (such as the `text
>   strings' in a dvi file or pdf file);

One thing (not particularly related to the existence of a TRM) which would
most likely be needed in the "glorious successor of TeX" is some way of
converting the latter kind of text string (in font) to the former kind, for
use in diagnostic and error messages. Already today the contents of overful
hboxes containing math can be very hard to work out from the log messages.
But it is probably easier to set up such a conversion if there is a TRM,
since then you "only" need to define explicitly conversions of everything
to and from the TRM, instead of separate conversions from each font
encoding to e.g. UTF-8 (and any other output file encoding that might be in
use).

Lars Hellstr�m

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung