Since Frank and I seem mainly to have been discussing terminology lately, I suppose the rest of the recipients of this list might well feel a bit left out. Furthermore since I guess quite a few of you haven't bothered (I don't think I would) to download my main text on the subject (the relenc package documentation), I thought it better if I did this for you. What follows below is an excerpt with the most relevant part. Lars Hellström \documentclass[a4paper]{ltxdoc} \newcommand\B{\penalty300\relax} \newcommand\package[1]{\textsf{#1}} \title{The \package{relenc} package} \author{Lars Hellstr\"om% \thanks{E-mail: \[log in to unmask] } \begin{document} \maketitle \section{Motivation} \label{Motivation} % This paper is about some shortcomings that, in my humble opinion, exists in the way \LaTeX\ handles fonts. I also point out a way in which these shortcomings can be overcome. The primary problem is ligatures, but as there are a few different ligature concepts that are of interest, let me begin with specifying my terms. A \emph{ligature} is a sequence of characters (almost always letters) that have been given an appearance somewhat different from the one the characters would have if simply put side to side, almost always because they would otherwise not look very pleasing to the eye. Despite this difference in appearence, it is still meant to be read as the entire character sequence, not as a completely new character. The canonical example of this is the `fi' ligature. In \TeX\ fonts, there is a special mechanism to implement this, and everything that is implemented using this mechanism will be called \emph{font ligatures}. It is almost always the case however, that some font ligartures are not ligatures as defined above, but simply a handy way to type characters that are hard or impossible to type using a standard keyboard; the canonical example of this is the `\texttt{--}' (two hyphens) to `--' (endash) conversion that is present in most \TeX\ fonts. Such nonproper ligatures will be called \emph{syntactic ligatures}, and proper ligatures will sometimes be called \emph{aestetic ligatures} to stress their origin. A \emph{font-dependent command} in \LaTeX\ is a command whose actions depend directly or indirectly on which font is the current. (I would not consider a command |\foo| defined by \begin{verbatim} \def\foo{\char65 } \end{verbatim} as a font-dependent command since it always does the same thing. The results need not always be identical, but that is because the command is executed under different conditions.) An example of a font-dependent command is |\"|, which is (roughly) |\accent 127| when the current font is \texttt{OT1}-encoded and |\accent 4| when the current font is \texttt{T1}-encoded. (The dependence is indirect since the command directly depends on a macro which is set during the font selection process, but there is a dependence.) For the purposes of this paper, if would also suffice to define a font-dependent command as a command that is defined by some of the commands |\DeclareTextCommand|, |\ProvideTextCommand|, |\DeclareTextSymbol|, |\Declare|\B|Text|\B|Command|\B|Default|, |\Provide|\B|Text|\B|Command|\B|Default|, or |\Declare|\B|TextAccent|. \LaTeX\ documentation uses the term `encoding-specific command' for these, but for reasons that will soon be appearent, that term would be somewhat inappropriate here. Thus, with these definitions taken care of, it is now time to get to the point. The recommended latin font encoding these days is the \texttt{T1}/`Cork'\slash`Extended \TeX\ text' encoding, and this is rightfully so. It is clearly superior to the old \texttt{OT1} encoding, as it adds more than a hundred accented characters to those which can be used to form a word that \TeX\ can automatically hyphenate, but there is at least one case in which the \texttt{OT1} encoding is preferable. This case is when the font has many ligatures. In the \texttt{T1} encoding, there are seven slots available for ligatures, and these have been assigned to the `ff', `fi', `fl', `ffi', `ffl', `IJ', and `ij' ligatures. Since all slots have been assigned to something, there is no place to put an additional ligature, even if it is needed. Thus the conclusion is that if a font is to be \texttt{T1} encoded, it cannot contain any ligatures in addition to the aforemensioned; to put it the other way, if a font design requires the presence of a ligature other than the aforemensioned, it cannot be \texttt{T1} encoded. In the \texttt{OT1} encoding, there are only five slots assigned to ligatures, but there are 128 unassigned slots that can be used for anything the font designer wants. Thus having more than five ligatures in an \texttt{OT1} encoded font is no problem, but a recourse to using \texttt{OT1} is not a very good option, as it leaves the hyphenation problem unsolved. The solution, then, would seem to be the creation of a new encoding, and part of it will, but this will not be quite sufficient for reasons I will shortly describe. For the moment though, let us, as an intellectual experiment, assume that we shall solve this problem with \texttt{T1} having too few slots for ligatures by creating a new encoding for a hypothetical font that would need more than seven ligatures. Let us also assume that the new encoding shall be a modified version of the \texttt{T1} encoding, where some accented characters will have been left out to make room for the ligatures. Finally, let us assume that we want to be as international as possible and include as many of the accented characters as we can squeeze in. These are three simple assumptions, and there are good reasons for all of them. How \emph{many} slots do we need to assign to ligatures, then? This varies, of course, between different font families, but it might vary \emph{even more} between fonts in the same family. The \texttt{it} shapes might need a few more than the \texttt{n} shapes, while the \texttt{sc} shapes might not need any at all (`\textsc{fi}' (|fi|) and `\textsc{f{}i}' (|f{}i|) look exactly the same in most font families). Instead, there are some accents which are harder to put on in the \texttt{sc} shapes (in many font families the ring on \textsc{a} in \textsc{\r{a}} should touch the main letter; this is not what the default definition does), so it appears that the optimal thing to do would be to have slightly different encodings for different fonts, even if they belong to the same family. This is theoretically no problem; \TeX's macro facilities are flexible enough to allow user level commands that do different things in different fonts. It becomes, however, a problem to do this in a reasonably universal way, so that the macros produced work in general and not only for a single font family. Standard \LaTeX\ has a mechanism for doing precisely this. Using the commands |\DeclareTextCommand|, |\DeclareTextSymbol|, |\DeclareTextAccent|, or one of their relatives, one can give a definition of a command that is used with one particular font encoding and not with any other. The problem with using this mechanism here is that one might have to have the normal and italic variants declared as having different encoding attributes (as well as different shapes), so one would have to either device a whole new set of font changing commands or redefine \LaTeX's own high-level font changing commands (such as |\textit|) to change encoding as well as shape or series. Neither alternative is good, and one can expect several incompability problems to arise for both of them. A better solution starts with recognizing that there are actually two different `encoding' concepts that can be found here. One is the attribute by which fonts are selected in \LaTeX, the other is the actual layout of a font. I will call this latter concept a \emph{coding scheme} and reserve \emph{encoding} for the former. (Formally, one may start by defining a \emph{slot} to be an integer in the range 0--255 and a \emph{glyph} to be a pattern (usually recognizable as a letter, digit, punctuation mark, or some other part of written language, but it need not always be). A coding scheme can then be defined as a mapping of slots to classes of glyphs% \footnote{The reason a coding scheme maps to classes of glyphs, rather than just to glyphs, is that a glyph is defined as a pattern and there are usually many patterns which serve equally well as, for example, the letter `a'. The class for `a' contains all a's in all fonts. One would furthermore expect it to contain all A's (for the sake of all-caps fonts) and all Asmall's (for the sake of c\&sc fonts).}. A font complies to a particular coding scheme if, for every slot $n$ in the domain of the coding scheme, the glyph occupying slot $n$ of the font is a member of the class that the encoding scheme maps $n$ to. But I digress.) As far as I know, there is no strict defintion of what an encoding is, apart from the operational given in \cite{fntguide} as something that is part of the specification of a font. (The canonical source for such a definition would be \cite{encguide}, but that paper is, according to its author, ``still in an embryo state''.) In font discussions, an encoding is often taken to imply a specific coding scheme, and many encoding definition files seem to be all about listing the coding scheme, but is this implication suitable? I would claim that in this case, it is not. A more constructive definition would be to see an encoding as a specification of which font-dependent commands are available to the author. An encoding definition file, on the other hand, is a specification of the interface between \LaTeX\ macros and the information in a \TeX\ font. It does not matter to the author whether \H{o} is |\char174| of the current font, generated as |\accent125o| by \TeX, or whatever. The only thing that matters is that when the author types |Erd\H{o}s|, it comes out as Erd\H{o}s. Consequently, there is really no need for the font-dependent commands in \LaTeX\ to do the same thing for any two fonts with the same encoding attribute, it is merely the case that standard \LaTeX\ does not offer an interface for defining font-dependent commands in any other way. The natural remedy for this then, would be to write a package which offers such an interface. This is what I have done; the package is called \package{relenc} and this paper is its documentation. Its usage and implementation are described in the following sections, and the appendices describe some accompanying files. I shall however conclude this section by an attempt to elaborate the above view on what an encoding is, or perhaps rather, what it should be. The encoding property of a font is a set of rules that determines how the author's manuscript is interpreted---the input character \texttt{q} for example has not the same interpretation in a \texttt{T1} encoded font (where it is the letter `q') as in an \texttt{OT2} encoded font (where it is a cyrillic letter whose closest latin equivalent is the Czech `\v{c}'). An encoding specification should therefore be a formalization of an agreement between the font designer on one hand and the author on the other---it specifies which rules each side must comply with and which results that can then be expected. An example of the author's rules may be to refrain from writing \TeX\ code like |\char 166|, because the font designer may have an option on what to put in that slot. If the author breaks the rules, he or she may find that the manuscript produced contains text whose meaning is not the same if typeset with two different fonts even if they do have the same encoding property. In practice, the author's rules for the standard text encodings are pretty much the same as the rules on how write \TeX\ code we find in every elementary book on the subject, so they are hardly new to us. An example of the font designer's rules may be to put an exclamation mark in slot 33, so that \texttt{!} actually print as one, or to include a font ligature that converts two consequtive hyphens to an endash, so that |--| actually will print as an endash, which the author by tradition expects it to do. If the font designer breaks the rules then authors who follow their rules might find that they do not get the right results anyway and such a font designer is likely to get complaints from authors about this. In practice however, the font designer rules are often vaugely specified if specified at all and hence there are gray areas for most encodings where there are no rights and wrongs. The \texttt{OT1} encoding is probably the one most plauged by these; the dollar versus sterling problem (an excellent example of how changing the glyph of a single slot may completely alter the interpretation of a text) is a classic. One of my intentions with writing this text is to work for that these gray areas are shrunken or even completely eliminated, although I do not think there is anything that can be done for the \texttt{OT1} encoding---its irregularities are much too well known and exploited. Now if an encoding is (a formalization of) an agreement, how do the parties agree to it? On the font designer's side this happens when the font designer gives a font a specific encoding by writing a font definition file that defines that font with that encoding. On the author's side this happens when the author selects a font with that encoding property. So far the informal description, now it is time to get to the formalization. Which exactly are the rules for the author and for the font designer? This varies between different encodings, but only in the details. The areas the encoding specification must cover can be listed and are: \begin{itemize} \item Which input characters that can be used directly to produce some of the font's glyphs in the output and what they will generate. This pertains to the author, who shouldn't use other input characters. The allowed ones do however have well-defined results. \item Which coding scheme the font must comply with. The pertains to the font designer. There are no direct restrictions on the use of slots not listed in this coding scheme.\footnote{There may be indirect restrictions, see below.} \item Which the required syntactic ligatures are. This pertains to both author and font designer. The author cannot trust any in addition to these, the font designer must include them.\footnote{It could well be that there \emph{should not} be any syntactic ligatures in addition to these. I know of no situation where there would be an advantage in adding syntactic ligatures.} \item Which the font-dependent commands are and what they will generate. This pertains to the author in the same manner as does the input character rules. \item Which the required font dimensions are and what they stand for. This pertains to both the author and the font designer in the same manner as does the syntactic ligature rules.\footnote{Even though very few physical authors access any font dimensions, the same does not hold for packages, and these also count as authors in this context.} \end{itemize} After these have been specified, the grey areas should be very small indeed! There are however a few additional twists that must be sorted out. If the required coding scheme listed in the encoding specification does not cover all the 256 slots, then one must be aware that in particular the required syntactic ligatures, but also the font-dependent commands, may impose some restrictions on the font's coding scheme in addition to those expressed by the given coding scheme that the font must comply with. These restrictions are then of the form that a glyph from a specific class must be assigned to some slot, but the font designer may freely choose exactly which slot. Thus any single slot not specified by the required coding scheme may be used for just about anything. The use of the \package{relenc} package requires that the following area has to be added the ones listed above. \begin{itemize} \item The font designer must see to that for every combination of a variable command and a font, there is a variant that will give the specified result.\footnote{The terms \emph{variable command} and \emph{variant} are explained in the complete documentation.} \end{itemize} Hyphentation patterns do also offer theoretical problems to the use of the \package{relenc} package, as these refer explicitly to the coding scheme of the font. Problems with these can however not result in anything worse than bad hyphenation, so the interpretation of a text should not be affected. It is furthermore the case that in practice the problems can often be avoided (the complete documentation treats this topic in more detail). Finally, there are two font parameters---|\hyphenchar| and |\skewchar|---that do explicitly relate to the coding scheme of the font and which are not stored in the font itself. It is possible that the value of at least one of these should be specified in an encoding specification, but that particular question is not of immediate interest to the \package{relenc} package, as \LaTeX\ itself already provides the font designer with the ability to set these for each font individually (using the sixth argument of |\Declare|\B|Font|\B|Shape|). \section{Usage} \subsection{Author usage} All the author has to do to use fonts with a relaxed encoding, as opposed to fonts with for example the \texttt{T1} encoding, is to include the command \begin{verbatim} \usepackage{relenc} \end{verbatim} in the preamble and load the encoding definition file, for example using the \package{fontenc} package. It is however important that the \package{relenc} package is loaded \emph{before} the encoding definition file, as the latter uses commands defined in the former. \begin{thebibliography}{99} % \bibitem{ltoutenc} Johannes Braams, David Carlisle, Alan Jeffrey, Frank Mittelbach, Chris Rowley, Rainer Sch\"opf: \texttt{ltoutenc.dtx} (part of the \LaTeXe\ base distribution). % \bibitem{fontinst} Alan Jeffrey, Rowland McDonnell (manual), Sebastian Rahtz, Ulrik Vieth: \emph{The fontinst utility} (v\,1.8), \texttt{fontinst.dtx}, in CTAN at \texttt{ftp:/\slash ftp.tex.ac.uk\slash tex-archive\slash fonts\slash utilities\slash fontinst\slash}\textellipsis % \bibitem{fntguide} \LaTeX3 Project Team: \emph{\LaTeXe\ font selection}, \texttt{fntguide.tex} (part of the \LaTeXe\ base distribution). % \bibitem{encguide} Frank Mittelbach [et al. ?]: \texttt{encguide.tex}. To appear as part of the \LaTeXe\ base distribution. Sometime. Or at least, that is the intention. % \end{thebibliography} \end{document}