## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: A Language Model for LaTeX (2/2) From: Frank Mittelbach <[log in to unmask]> Reply To: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Fri, 20 Jun 1997 16:42:29 +0200 Content-Type: text/plain Parts/Attachments: text/plain (347 lines)
\iffalse

part two of the paper

\fi

\subsection{Formatting}

Although each of the examples listed here has been documented as
characteristic of the typography associated with a particular
language, they are all also aspects of the design over which a
document designer may wish to have control that is independent of
the language of the text.

\paragraph{Direction}
The direction of the text and, more generally, the writing system used
are very strongly associated with the language in use.

\paragraph{Micro-rendering}
This covers the details of rendering at the level of individual glyphs
and the relationships, often complex, between the characters which
form the textual part of the logical document and the glyphs used to
render this text, especially when aiming for the highest levels of
typographic quality.  These details often depend on what glyphs are
provided by the available fonts.  Also, when using \TeX, this level of
formatting is typically controlled entirely by the choice of font,
whereas it should be possible to specify such details independent of
the font since they also depend on the language in use.

Some examples:
\begin{itemize}
\item The precise positioning of diacritics depends on the
language; e.g.,~a language such as German with many umlauts puts
them closer to the top of the basic letter than is typically done
with the diaeresis in English or French typography.
\item The use of aesthetic ligatures varies from language to language,
e.g.,~the ffl-ligature is traditionally not used in Portuguese and
Turkish typography (implementing this is difficult in \TeX{} since
these transformations are normally controlled entirely by the font
and there is no simple way to turn them off').
\end{itemize}

\paragraph{Macro-rendering}
More global aspects of typography can also be language-dependent, for
example:
\begin{itemize}
\item the formatting of in-line quotes (i.e.~what quotation marks' to
use);
\item rendering of enumerations;
\item aspects of page layout (e.g.,~float placement).
\end{itemize}

As with most language-related actions they usually have a wide range
of formatting possibilities and can be considered to depend, at least
partially, on house style or other factors.

\section{Attaching Actions to Change of Language }

Having described some typical changes that need to be made at a
language tag, we now look at how to tie particular actions to a
particular tag, noting that it is not sensible, for example, to change
every aspect of the formatting if only an in-line fragment of a few
words is to be in a different language.

\subsection{Attaching actions to tags}

First we note the following facts.
\begin{itemize}
\item The type of actions that are required at language tags can be
modeled by setting the values of a collection of parameters to those
appropriate for the new language.

\item Some actions may not make sense at certain levels of the
hierarchies. For example, while one wants to use the correct
hyphenation algorithm at any level of the hierarchies changing of
micro-rendering, such as the positioning of diacritics, might be
applied only to language changes for whole paragraphs but not for
fragments.
\item However, for most actions it is not possible to specify one
place in the hierarchies that will produce the correct location of
that action for \emph{all} documents.  The correct place might, for
example, depend on document type or on a particular house style.

\end{itemize}

There are two (at least) possibilities for specifying, for a
particular document, where in the tag hierarchy an action should be
attached' (see Figure~\ref{fig:twohs}).  These are by the
nesting-level in the hierarchy of language tags or by the visual type
of the language tags as described in section~\ref{sec:visual}.  These
visual tag-types implicitly define a partial hierarchy, from the top:
document, base, block, fragment.

In both cases an action is defined to be executed down to a prescribed
level in the hierarchy.  As noted above, different actions might be
executed down to different levels so there needs to be a mechanism to
specify this level for each action.  To limit the complexity of the
model we think it is advisable to assume that this stopping level
depends on the action but not on the language.  It was pointed out in
Tsukuba that this is probably an oversimplification, i.e.~that there
exist cases where it would be better to model the formatting of
language-related items by attaching of language/action pairs to
levels.  However, we think that these cases are sufficiently rare
that they can be handled by the action itself.\footnote{An action that
depends both on language and level could be specified in the model
by executing it on all levels with an additional conditional within
the action body testing for the current language.}

It is also possible to combine these two hierarchies and allow the
attachment of actions to tags via either hierarchy (see
Figure~\ref{fig:THD}).  In this case, for each action it is necessary
to define:
\begin{itemize}
\item on which of the two hierarchies the stopping of the action depends;
\item down to what level the action is carried out in that hierarchy.
\end{itemize}

\subsection{Data structures for this model}

For this model of language tags/actions, the system needs to specify the
contents of the following three data structures.

\subsubsection{Tag hierarchy diagram (THD)}

While combining the two hierarchies we have simplified their structure
(compare figures~\ref{fig:twohs} and~\ref{fig:THD}), i.e.~multiple
nestings of paragraphs are
collapsed  into a single node.
At the same time a new root node (document-level) was added. This node
serves as an anchor point for typographic requirements that should
stay fixed throughout the document even if the base language changes.

\begin{figure}
\centering
\setlength\unitlength{10pt}
\frame{%
\begin{picture}(23,12)(-2,-1)

\drawline(10,10)(10,8)(8,6)(8,2)(10,0)
\dottedline[$\bullet$]{2}(10,10)(10,8)
\dottedline[$\bullet$]{1}(8,6)(8,2)
\dottedline[$\bullet$]{2}(10,0)(10,0)

\drawline(10,8)(12,6)(12,2)(10,0)
\dottedline[$\bullet$]{4}(12,6)(12,2)

\multiputlist(8.5,10)(0,-2)[r]{document level,base language level}

\multiputlist(7,6)(0,-1.5)[r]{first nesting level,second nesting
level,\ldots}

\multiputlist(8.5,0)(0,-2)[r]{n\textsuperscript{th} nesting level}

\multiputlist(13,6)(0,-4)[l]{block level,fragment level}
\multiputlist(11.5,0)(0,-4)[l]{nested fragment level}

\end{picture}%
}
\caption{Tag hierarchy diagram (THD)}
\label{fig:THD}
\end{figure}

The required number of significant nestings in the hierarchy of
nesting-levels is an open question but probably $n=3$ is
sufficient to specify typical formatting requirements.

The two end points of the hierarchies (n\textsuperscript{th}
nesting-level and nested-fragment-level) are combined as they
essentially mean to carry out attached actions in all cases, thus it
does not matter on which hierarchy they are specified.

Another interesting point is that the base-language-level of both
hierarchies are combined.\footnote{From this it follows that in this
model a base language change is only allowed between paragraphs.}

Nevertheless, it should be noted that the level'' of a tag within
the THD is logically described by a pair of nodes (one on each
hierarchy) even though in some cases these nodes collapse into one.

\subsubsection{Language actions table (LAT)}

This two-dimensional table (indexed by parameter-group and
language-label) stores the effect of each action (i.e.~the value for a
parameter-group) for each language (possibly only a default value if
no value has been explicitly defined for that language).  Each entry
is an expression that returns a set of values appropriate to the
parameter-group.

It may be possible\footnote{Such details can have large effects
on the efficiency of the implementation, thus we are being cautious
here.} to also allow special actions to be specified, such as:
\begin{itemize}
\item leave unchanged;
\item use some default (e.g.~the value for the document language).
\end{itemize}

\subsubsection{Parameter assignment map (PAM)}
This one-dimensional table maps each each action (modeled by a
parameter-group) to a single node in the THD.

Such an assignment means that this parameter group changes its value
(using the method specified in the LAT) at all levels down to (and
including) the node to which it is mapped.

\section{Special Regions}\label{sec:moving}

The scheme we have outlined so far will work well for the main text of
many documents but it needs to be supplemented in order to handle
formatting of the following material (called special regions):
\begin{itemize}
\item regions that contain text which has moved from other parts of
\item regions of text that are first formatted and then the whole
block is moved, e.g.,~(from \LaTeX) floating tables, footnotes;
\item regions that can contain elements breaking the type hierarchy,
e.g.,~paragraphs in table-cells.
\end{itemize}

There are several problems that arise when moving things around'' in
a document: one of these, which arises only when logical (unformatted)
text is being moved, is the need to move language information with the
moving text.  This is needed even if the text being moved is in the
document language since this may not be the current language at the
point to which it moves.  Thus the data-type for logical stuff being
moved' must be the text and a language-label (describing its
language).

\subsection{Formatting special regions}

A problem that affects the formatting of all special regions is how
to specify the language to be used and the effective level of language
tags contained within the special region.  It is not possible to simply
extend the THD and PAM from the main part of the document since these
assume that the nesting of the language tags in the logical document
is faithfully represented in the formatted document.  This is very
clearly not the case with regions such as floats or end-notes which
appear visually in totally unrelated parts of the document.  It is
also not true for paragraphs within tables since these can be,
logically, paragraphs within paragraphs, and our classification of
language tags into types does not allow for this.

One possible solution to this problem is to allow the specification of
a local PAM for each type of special region.  This requires:
\begin{itemize}
\item a method to set the starting-language for the region;
\item the specification of a local PAM for the region.
\end{itemize}

The disadvantage of this solution is its inherent complexity: for each
special region the designer of a document class needs to specify a
full mapping of all language-related actions to the tag hierarchy (the
local PAM).  Since the numbers of both the special regions and the
language-related actions are potentially unlimited, this would result
in either a very complex set-up mechanism or the use of general
defaults (e.g., the local PAM nearly always inherits from the global
document PAM) in which case the solution is unnecessarily complicated.

\subsection{A practical solution}

A simpler solution is to use the PAM from the main document but to
allow the specification, for each type of special region, of how the
information from the PAM is used.  This would be done by specifying
the following:
\begin{itemize}
\item a method to set the starting-language for the region;
\item the actual initialisation-level (init-level) for the change to
this starting language;
\item the effective level (inner-level), as far as imbedded tags are
concerned, of this change to the starting-language for the region .
\end{itemize}
We now give an expanded description of these items.

\paragraph{Starting language}
In the case of special regions that receive unformatted text the
starting-language will directly affect only the text generated by the
region's tags themselves as each bit of received text will carry its
own language label (see section~\ref{sec:moving}).  In the case of
regions that move after being formatted it defines the default
language used when formatting this region.

\paragraph{Initialization}
At the start of the region, actions are executed as if the region
started with a tag whose level (in the THD, i.e.~a pair of nodes) is
init-level using this starting-language.  This results in setting
parameters to values suitable for that starting-language whilst
allowing for a region to move to a different visual context.

\paragraph{Inner processing}
Within the region, language tags are processed as if the region
started with a tag whose level (in the THD) is inner-level
(inner-level must be at least as deep\footnote{An alternative model
would be to also allow inner-level to be one less than init-level.
This would mean that language tags within the special region are
acting as language changes on the same level as the starting
language of the region.}
as init-level in the THD).  This allows finer control over the subset
of actions executed at imbedded language tags.

\section{Interfaces for the Rendering Model}

The following interfaces will be provided for use by writers of class
and package files:
\begin{itemize}
\item specifying the THD (this will probably be fixed, at least in the
first version);
\item specifying entries in the PAM;
\item specifying entries in the LAT;
\item specifying explicitly that a language-command
(i.e.~parameter-group) will potentially be used by the current
package or class\footnote{These declarations allow the local
customizations for all language actions to be stored in one place
(e.g.,~PAM or LAT modifications); the system can then select only
those that are actually needed for the current document.};
\item specifying the starting-language and init/inner levels
for special regions;
\item handling language information for moving text.
\end{itemize}

In addition to the new commands and environments outlined in
Section~\ref{sec:newuser}, the following interfaces will be provided
for use in documents (the first two must be in the preamble):
\begin{itemize}
\item specifying the document-language;
\item specifying all the languages used in a document;
\item possibly an interface for overwriting the starting language of
a particular special region
\end{itemize}
The second item above is not strictly necessary as the information can
be obtained by processing the document; however, a large saving of
time and space can be made if the full list of languages actually used
is specified in the preamble.

\end{document}