Print

Print


As of late, I've spent quite some time thinking about font encodings and
related matters; in particular I have considered the matter of how they
should be specified. The discussions on this list these last months about
problems related to multilinguality have been quite inspiring (even though
I probably wouldn't have done much about it had not also some other
projects I've been working on presented a need for clarifications in this
area) and I have now compiled my thoughts in a paper on the matter, which
can be found in

  http://abel.math.umu.se/~lars/encodings/

Comments on its contents are welcome; I'd like to see a discussion about it
here on this list so that there could be an "official" acceptance or
rejection of the ideas expressed therein. (In the former case, this paper
might perhaps evolve into an "encguide".)

One matter in particular which I believe is of interest on this list is the
following passage about the output of LaTeX and the corresponding attempt
at defining what a LaTeX font encoding really is:
%%%%%%%%%%%%%%%%%%%%
On its way out of \LaTeX\ towards the printed text, a character passes
through a number of stages. The following five seem to cover what is
relevant for the present discussion:
\begin{enumerate}
  \item \emph{\LaTeX\ Internal Character Representation} (LICR)~%
    \cite{LICR}. At this point the character is a character token
    (e.g.~|a|), a text command (e.g.~|\ss|), or a combination
    (e.g.~|\H{o}|).
  \item \emph{Horizontal material;} this is what the character is
    en route from \TeX's mouth to its stomach. For most characters
    this is equivalent to a single |\char| command (e.g.\ |a| is
    equivalent to |\char|\,|97|), but some require more than one, some
    are combined using the |\accent| and |\char| commands, some
    involve rules and\slash or kerns, and some are built using boxes
    that arbitrarily combine the above elements.
  \item \emph{DVI commands;} this is the DVI file commands that
    produce the printed representation of the character.
  \item \emph{Printed text;} this is the graphical representation of
    the character, e.g. as ink on paper or as a pattern on a computer
    screen. Here the text consists of glyphs.
  \item \emph{Interpreted text;} this is essentially printed text
    modulo equivalence of interpretation, hence the text doesn't really
    reach this stage until someone reads it. Here the text consists of
    characters.
\end{enumerate}

In theory there is a universal mapping from LICR to interpreted text,
but various technical restrictions make it impossible to simultaneously
support the entire mapping. A \LaTeX\ encoding selects a restriction
of this mapping to a limited set which will be ``well supported''
(meaning kerning and such between characters in the set works), whereas
elements outside this set at best can be supported through temporary
encoding changes. The encoding also specifies a decomposition of the
mapping into one part which maps LICR to horizontal material and one
part which maps horizontal material to interpreted text. The first
part is realized by the text command definitions usually found in the
\meta{enc}\texttt{enc.def} file for the encoding. The second part is
the font encoding, the specification of which is the topic of this
paper. It is also worth noticing that an actual font is a mapping of
horizontal material to printed text.

An alternative decomposition of the mapping from LICR to interpreted
text would be at the DVI command level, but even though this
decomposition is realized in most \TeX\ implementations, it has very
little relevance for the discussion of encodings. The main reason for
this is that it depends not only on the encoding of a font, but
also on its metrics. Furthermore it is worth noticing that in pdf\TeX\
there needs not be a DVI command level.
%%%%%%%%%%%%%%%%%%%%

Lars Hellström