Font encoding specifications

Fri, 22 Jun 2001 22:54:31 +0200

 As of late, I've spent quite some time thinking about font encodings and related matters; in particular I have considered the matter of how they should be specified. The discussions on this list these last months about problems related to multilinguality have been quite inspiring (even though I probably wouldn't have done much about it had not also some other projects I've been working on presented a need for clarifications in this area) and I have now compiled my thoughts in a paper on the matter, which can be found in   http://abel.math.umu.se/~lars/encodings/ Comments on its contents are welcome; I'd like to see a discussion about it here on this list so that there could be an "official" acceptance or rejection of the ideas expressed therein. (In the former case, this paper might perhaps evolve into an "encguide".) One matter in particular which I believe is of interest on this list is the following passage about the output of LaTeX and the corresponding attempt at defining what a LaTeX font encoding really is: %%%%%%%%%%%%%%%%%%%% On its way out of \LaTeX\ towards the printed text, a character passes through a number of stages. The following five seem to cover what is relevant for the present discussion: \begin{enumerate}   \item \emph{\LaTeX\ Internal Character Representation} (LICR)~%     \cite{LICR}. At this point the character is a character token     (e.g.~|a|), a text command (e.g.~|\ss|), or a combination     (e.g.~|\H{o}|).   \item \emph{Horizontal material;} this is what the character is     en route from \TeX's mouth to its stomach. For most characters     this is equivalent to a single |\char| command (e.g.\ |a| is     equivalent to |\char|\,|97|), but some require more than one, some     are combined using the |\accent| and |\char| commands, some     involve rules and\slash or kerns, and some are built using boxes     that arbitrarily combine the above elements.   \item \emph{DVI commands;} this is the DVI file commands that     produce the printed representation of the character.   \item \emph{Printed text;} this is the graphical representation of     the character, e.g. as ink on paper or as a pattern on a computer     screen. Here the text consists of glyphs.   \item \emph{Interpreted text;} this is essentially printed text     modulo equivalence of interpretation, hence the text doesn't really     reach this stage until someone reads it. Here the text consists of     characters. \end{enumerate} In theory there is a universal mapping from LICR to interpreted text, but various technical restrictions make it impossible to simultaneously support the entire mapping. A \LaTeX\ encoding selects a restriction of this mapping to a limited set which will be well supported'' (meaning kerning and such between characters in the set works), whereas elements outside this set at best can be supported through temporary encoding changes. The encoding also specifies a decomposition of the mapping into one part which maps LICR to horizontal material and one part which maps horizontal material to interpreted text. The first part is realized by the text command definitions usually found in the \meta{enc}\texttt{enc.def} file for the encoding. The second part is the font encoding, the specification of which is the topic of this paper. It is also worth noticing that an actual font is a mapping of horizontal material to printed text. An alternative decomposition of the mapping from LICR to interpreted text would be at the DVI command level, but even though this decomposition is realized in most \TeX\ implementations, it has very little relevance for the discussion of encodings. The main reason for this is that it depends not only on the encoding of a font, but also on its metrics. Furthermore it is worth noticing that in pdf\TeX\ there needs not be a DVI command level. %%%%%%%%%%%%%%%%%%%% Lars Hellström