LISTSERV mailing list manager LISTSERV 16.0

Help for LATEX-L Archives


LATEX-L Archives

LATEX-L Archives


LATEX-L@LISTSERV.UNI-HEIDELBERG.DE


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

LATEX-L Home

LATEX-L Home

LATEX-L  June 1997

LATEX-L June 1997

Subject:

A Language Model for LaTeX (2/2)

From:

Frank Mittelbach <[log in to unmask]>

Reply-To:

Mailing list for the LaTeX3 project <[log in to unmask]>

Date:

Fri, 20 Jun 1997 16:42:29 +0200

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (346 lines)

\iffalse

   part two of the paper

\fi


\subsection{Formatting}

Although each of the examples listed here has been documented as
characteristic of the typography associated with a particular
language, they are all also aspects of the design over which a
document designer may wish to have control that is independent of
the language of the text.

\paragraph{Direction}
The direction of the text and, more generally, the writing system used
are very strongly associated with the language in use.

\paragraph{Micro-rendering}
This covers the details of rendering at the level of individual glyphs
and the relationships, often complex, between the characters which
form the textual part of the logical document and the glyphs used to
render this text, especially when aiming for the highest levels of
typographic quality.  These details often depend on what glyphs are
provided by the available fonts.  Also, when using \TeX, this level of
formatting is typically controlled entirely by the choice of font,
whereas it should be possible to specify such details independent of
the font since they also depend on the language in use.

Some examples:
\begin{itemize}
\item The precise positioning of diacritics depends on the
  language; e.g.,~a language such as German with many umlauts puts
  them closer to the top of the basic letter than is typically done
  with the diaeresis in English or French typography.
\item The use of aesthetic ligatures varies from language to language,
  e.g.,~the ffl-ligature is traditionally not used in Portuguese and
  Turkish typography (implementing this is difficult in \TeX{} since
  these transformations are normally controlled entirely by the font
  and there is no simple way to `turn them off').
\end{itemize}

\paragraph{Macro-rendering}
More global aspects of typography can also be language-dependent, for
example:
\begin{itemize}
\item the formatting of in-line quotes (i.e.~what `quotation marks' to
  use);
\item rendering of enumerations;
\item aspects of page layout (e.g.,~float placement).
\end{itemize}

As with most language-related actions they usually have a wide range
of formatting possibilities and can be considered to depend, at least
partially, on house style or other factors.

\section{Attaching Actions to Change of Language }

Having described some typical changes that need to be made at a
language tag, we now look at how to tie particular actions to a
particular tag, noting that it is not sensible, for example, to change
every aspect of the formatting if only an in-line fragment of a few
words is to be in a different language.

\subsection{Attaching actions to tags}

First we note the following facts.
\begin{itemize}
\item The type of actions that are required at language tags can be
  modeled by setting the values of a collection of parameters to those
  appropriate for the new language.

\item Some actions may not make sense at certain levels of the
  hierarchies. For example, while one wants to use the correct
  hyphenation algorithm at any level of the hierarchies changing of
  micro-rendering, such as the positioning of diacritics, might be
  applied only to language changes for whole paragraphs but not for
  fragments.
\item However, for most actions it is not possible to specify one
  place in the hierarchies that will produce the correct location of
  that action for \emph{all} documents.  The correct place might, for
  example, depend on document type or on a particular house style.

\end{itemize}

There are two (at least) possibilities for specifying, for a
particular document, where in the tag hierarchy an action should be
`attached' (see Figure~\ref{fig:twohs}).  These are by the
nesting-level in the hierarchy of language tags or by the visual type
of the language tags as described in section~\ref{sec:visual}.  These
visual tag-types implicitly define a partial hierarchy, from the top:
document, base, block, fragment.


In both cases an action is defined to be executed down to a prescribed
level in the hierarchy.  As noted above, different actions might be
executed down to different levels so there needs to be a mechanism to
specify this level for each action.  To limit the complexity of the
model we think it is advisable to assume that this stopping level
depends on the action but not on the language.  It was pointed out in
Tsukuba that this is probably an oversimplification, i.e.~that there
exist cases where it would be better to model the formatting of
language-related items by attaching of language/action pairs to
levels.  However, we think that these cases are sufficiently rare
that they can be handled by the action itself.\footnote{An action that
  depends both on language and level could be specified in the model
  by executing it on all levels with an additional conditional within
  the action body testing for the current language.}


It is also possible to combine these two hierarchies and allow the
attachment of actions to tags via either hierarchy (see
Figure~\ref{fig:THD}).  In this case, for each action it is necessary
to define:
\begin{itemize}
\item on which of the two hierarchies the stopping of the action depends;
\item down to what level the action is carried out in that hierarchy.
\end{itemize}


\subsection{Data structures for this model}

For this model of language tags/actions, the system needs to specify the
contents of the following three data structures.

\subsubsection{Tag hierarchy diagram (THD)}

While combining the two hierarchies we have simplified their structure
(compare figures~\ref{fig:twohs} and~\ref{fig:THD}), i.e.~multiple
nestings of paragraphs are
collapsed  into a single node.
At the same time a new root node (document-level) was added. This node
serves as an anchor point for typographic requirements that should
stay fixed throughout the document even if the base language changes.

\begin{figure}
\centering
\setlength\unitlength{10pt}
\frame{%
\begin{picture}(23,12)(-2,-1)

\drawline(10,10)(10,8)(8,6)(8,2)(10,0)
\dottedline[$\bullet$]{2}(10,10)(10,8)
\dottedline[$\bullet$]{1}(8,6)(8,2)
\dottedline[$\bullet$]{2}(10,0)(10,0)

\drawline(10,8)(12,6)(12,2)(10,0)
\dottedline[$\bullet$]{4}(12,6)(12,2)

\multiputlist(8.5,10)(0,-2)[r]{document level,base language level}

\multiputlist(7,6)(0,-1.5)[r]{first nesting level,second nesting
         level,\ldots}

\multiputlist(8.5,0)(0,-2)[r]{n\textsuperscript{th} nesting level}

\multiputlist(13,6)(0,-4)[l]{block level,fragment level}
\multiputlist(11.5,0)(0,-4)[l]{nested fragment level}


\end{picture}%
}
\caption{Tag hierarchy diagram (THD)}
\label{fig:THD}
\end{figure}

The required number of significant nestings in the hierarchy of
nesting-levels is an open question but probably $n=3$ is
sufficient to specify typical formatting requirements.

The two end points of the hierarchies (n\textsuperscript{th}
nesting-level and nested-fragment-level) are combined as they
essentially mean to carry out attached actions in all cases, thus it
does not matter on which hierarchy they are specified.

Another interesting point is that the base-language-level of both
hierarchies are combined.\footnote{From this it follows that in this
  model a base language change is only allowed between paragraphs.}

Nevertheless, it should be noted that the ``level'' of a tag within
the THD is logically described by a pair of nodes (one on each
hierarchy) even though in some cases these nodes collapse into one.


\subsubsection{Language actions table (LAT)}

This two-dimensional table (indexed by parameter-group and
language-label) stores the effect of each action (i.e.~the value for a
parameter-group) for each language (possibly only a default value if
no value has been explicitly defined for that language).  Each entry
is an expression that returns a set of values appropriate to the
parameter-group.

It may be possible\footnote{Such details can have large effects
on the efficiency of the implementation, thus we are being cautious
here.} to also allow special actions to be specified, such as:
\begin{itemize}
\item leave unchanged;
\item use some default (e.g.~the value for the document language).
\end{itemize}



\subsubsection{Parameter assignment map (PAM)}
This one-dimensional table maps each each action (modeled by a
parameter-group) to a single node in the THD.

Such an assignment means that this parameter group changes its value
(using the method specified in the LAT) at all levels down to (and
including) the node to which it is mapped.


\section{Special Regions}\label{sec:moving}

The scheme we have outlined so far will work well for the main text of
many documents but it needs to be supplemented in order to handle
formatting of the following material (called special regions):
\begin{itemize}
\item regions that contain text which has moved from other parts of
the document, e.g.,~table of contents, running heads;
\item regions of text that are first formatted and then the whole
block is moved, e.g.,~(from \LaTeX) floating tables, footnotes;
\item regions that can contain elements breaking the type hierarchy,
e.g.,~paragraphs in table-cells.
\end{itemize}

There are several problems that arise when ``moving things around'' in
a document: one of these, which arises only when logical (unformatted)
text is being moved, is the need to move language information with the
moving text.  This is needed even if the text being moved is in the
document language since this may not be the current language at the
point to which it moves.  Thus the data-type for `logical stuff being
moved' must be the text and a language-label (describing its
language).


\subsection{Formatting special regions}

A problem that affects the formatting of all special regions is how
to specify the language to be used and the effective level of language
tags contained within the special region.  It is not possible to simply
extend the THD and PAM from the main part of the document since these
assume that the nesting of the language tags in the logical document
is faithfully represented in the formatted document.  This is very
clearly not the case with regions such as floats or end-notes which
appear visually in totally unrelated parts of the document.  It is
also not true for paragraphs within tables since these can be,
logically, paragraphs within paragraphs, and our classification of
language tags into types does not allow for this.


One possible solution to this problem is to allow the specification of
a local PAM for each type of special region.  This requires:
\begin{itemize}
\item a method to set the starting-language for the region;
\item the specification of a local PAM for the region.
\end{itemize}

The disadvantage of this solution is its inherent complexity: for each
special region the designer of a document class needs to specify a
full mapping of all language-related actions to the tag hierarchy (the
local PAM).  Since the numbers of both the special regions and the
language-related actions are potentially unlimited, this would result
in either a very complex set-up mechanism or the use of general
defaults (e.g., the local PAM nearly always inherits from the global
document PAM) in which case the solution is unnecessarily complicated.

\subsection{A practical solution}

A simpler solution is to use the PAM from the main document but to
allow the specification, for each type of special region, of how the
information from the PAM is used.  This would be done by specifying
the following:
\begin{itemize}
\item a method to set the starting-language for the region;
\item the actual initialisation-level (init-level) for the change to
  this starting language;
\item the effective level (inner-level), as far as imbedded tags are
  concerned, of this change to the starting-language for the region .
\end{itemize}
We now give an expanded description of these items.

\paragraph{Starting language}
In the case of special regions that receive unformatted text the
starting-language will directly affect only the text generated by the
region's tags themselves as each bit of received text will carry its
own language label (see section~\ref{sec:moving}).  In the case of
regions that move after being formatted it defines the default
language used when formatting this region.

\paragraph{Initialization}
At the start of the region, actions are executed as if the region
started with a tag whose level (in the THD, i.e.~a pair of nodes) is
init-level using this starting-language.  This results in setting
parameters to values suitable for that starting-language whilst
allowing for a region to move to a different visual context.

\paragraph{Inner processing}
Within the region, language tags are processed as if the region
started with a tag whose level (in the THD) is inner-level
(inner-level must be at least as deep\footnote{An alternative model
  would be to also allow inner-level to be one less than init-level.
  This would mean that language tags within the special region are
  acting as language changes on the same level as the starting
  language of the region.}
as init-level in the THD).  This allows finer control over the subset
of actions executed at imbedded language tags.



\section{Interfaces for the Rendering Model}

The following interfaces will be provided for use by writers of class
and package files:
\begin{itemize}
\item specifying the THD (this will probably be fixed, at least in the
  first version);
\item specifying entries in the PAM;
\item specifying entries in the LAT;
\item specifying explicitly that a language-command
  (i.e.~parameter-group) will potentially be used by the current
  package or class\footnote{These declarations allow the local
    customizations for all language actions to be stored in one place
    (e.g.,~PAM or LAT modifications); the system can then select only
    those that are actually needed for the current document.};
\item specifying the starting-language and init/inner levels
  for special regions;
\item handling language information for moving text.
\end{itemize}

In addition to the new commands and environments outlined in
Section~\ref{sec:newuser}, the following interfaces will be provided
for use in documents (the first two must be in the preamble):
\begin{itemize}
\item specifying the document-language;
\item specifying all the languages used in a document;
\item possibly an interface for overwriting the starting language of
      a particular special region
\end{itemize}
The second item above is not strictly necessary as the information can
be obtained by processing the document; however, a large saving of
time and space can be made if the full list of languages actually used
is specified in the preamble.

\end{document}

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe


Archives

September 2019
July 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
June 2018
May 2018
April 2018
February 2018
January 2018
December 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
July 2016
April 2016
March 2016
February 2016
January 2016
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015
March 2015
February 2015
January 2015
December 2014
November 2014
October 2014
September 2014
August 2014
July 2014
June 2014
May 2014
April 2014
March 2014
February 2014
January 2014
December 2013
November 2013
October 2013
August 2013
July 2013
June 2013
May 2013
April 2013
March 2013
February 2013
January 2013
December 2012
November 2012
September 2012
August 2012
July 2012
June 2012
May 2012
March 2012
February 2012
January 2012
December 2011
November 2011
October 2011
September 2011
August 2011
July 2011
June 2011
May 2011
April 2011
March 2011
February 2011
January 2011
December 2010
November 2010
October 2010
September 2010
August 2010
July 2010
June 2010
May 2010
March 2010
February 2010
January 2010
December 2009
November 2009
October 2009
September 2009
August 2009
July 2009
June 2009
May 2009
April 2009
March 2009
February 2009
January 2009
December 2008
November 2008
October 2008
September 2008
May 2008
April 2008
March 2008
February 2008
January 2008
December 2007
November 2007
September 2007
August 2007
June 2007
May 2007
March 2007
December 2006
November 2006
October 2006
August 2006
July 2006
June 2006
May 2006
April 2006
March 2006
February 2006
November 2005
October 2005
September 2005
August 2005
May 2005
April 2005
March 2005
November 2004
October 2004
August 2004
July 2004
May 2004
April 2004
March 2004
February 2004
January 2004
December 2003
October 2003
August 2003
July 2003
June 2003
May 2003
April 2003
March 2003
February 2003
January 2003
December 2002
October 2002
September 2002
August 2002
July 2002
June 2002
March 2002
December 2001
October 2001
September 2001
August 2001
July 2001
June 2001
May 2001
April 2001
March 2001
February 2001
January 2001
December 2000
November 2000
October 2000
September 2000
August 2000
July 2000
May 2000
April 2000
March 2000
February 2000
January 2000
December 1999
November 1999
October 1999
September 1999
August 1999
May 1999
April 1999
March 1999
February 1999
January 1999
December 1998
November 1998
October 1998
September 1998
August 1998
July 1998
June 1998
May 1998
April 1998
March 1998
February 1998
January 1998
December 1997
November 1997
October 1997
September 1997
August 1997
July 1997
June 1997
May 1997
April 1997
March 1997
February 1997
January 1997
December 1996

ATOM RSS1 RSS2



LISTSERV.UNI-HEIDELBERG.DE

Universität Heidelberg | Impressum | Datenschutzerklärung

CataList Email List Search Powered by the LISTSERV Email List Manager