LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

i guess that the section heading below is anything else than correct
:-) but i do want to share some thoughts and comments i wrote during
the day on that topic and the hope that you can help with further
thoughts/agreement/disagreement etc

frank

--------------------------


\section{Clarifying the term ``language'' within \LaTeX{}}

 Within structured documents each portion of the document is
explicitly or implicitly labeled with a ``language tag'' denoting that
this portion of the document contains data written in a certain
``language''.

They have the following properties:
\begin{itemize}
\item
  They impose a tree structure on the document.
\item
  At any one point in the document the ``current language'' can be
  determined.
\end{itemize}


\TeX{} provides a very special type of region which can be
regarded as a language in its own right: mathematical formulas.
Although it doesn't follow exactly the same markup convention and
doesn't employ the same implementation concepts (as it has built in
support for various features) it nevertheless is on an abstract level
comparable in most respects.

Traditionally these language tags can have two independent semantics:
\begin{itemize}
\item
  Denoting the input conventions used within this region.

\item
  Denoting the rendering conventions used when formatting this region.
\end{itemize}

\subsection{Input conventions}

There are a number of implicit input conventions employed within a
\TeX{} based system, for example, each input character is attached a
so called category code that determines certain semantics of that
character. A category code of type ``letter'' or type ``other'' means
that in most situations\footnote{As \TeX{} is a macro language it is
of course possible that due to expansion of macros the interpretation
of the input is temporarily or even permanently modified.} the
character is rendered as a glyph in the slot position corresponding to
the \textsc{ascii} code value of the character in the current font. In
most cases this means that if, say, the character `a' is entered it
will eventually result in the glyph `a' in the current font. However,
one has to be aware that this crucially depends on the font encoding
in use.

From this observation it follows that for characters of this category
type either all font encodings have to produce the same logical glyph
or the correspondence between input and output representation have to
be explicitly marked up to allow the formatting process to select
fonts with the correct font encoding.

Another area of input conventions commonly used for language support
are the shortrefs, see~\ref{sec:shortrefs}.

\subsection{Rendering conventions}

There are many rendering conventions that depend or can depend on the
language of a textual region. Examples are the hyphenation algorithm
employed, the production of fixed strings, typographical conventions,
etc.

These conventions range from those that most likely need changing when
switching from one language to another to those that can be considered
being a matter of style but that do not produce incorrect results. For
most, if not for all rendering conventions this is a somewhat flexible
classification that does depend on various factors and might be
different for different applications.


\subsection{Discussion}

For both types of input conventions it seems that that they are not
necessarily tightly bound to the language of a region. For example, it
is perfectly possible (although currently not fully supported by
\LaTeX{}) to input text in a given language using more than one font
encoding and consequently needing different input conventions within
the area denoted by a language tag.  Similarly the shortrefs have an
application range that is not directly linked to language boundaries,
see section~\ref{sec:shortref-discussion} for further discussion.

Not all decisions the formatter has to make when encountering a
language tag (i.e., a change in languages) can be considered as being
globally bound to such a border, in fact most of them depend on the
current nesting structure, e.g., embedded language fragments might be
handled differently then a document with this language as the main
language. This is not to say that that there can't be a global bound
but that one needs specification flexibility in deciding which
conventions are applied/changed at what level.