## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

Pedro J. Aphalo writes:
> I am just replying to you, not the list now.

you are mistaken you did reply to the list :-)

> On 18 Feb 97 at 13:42, Hans Aberg wrote:
>
> >
>
> >   The reason I put these up, was that it was asked for items to be
> > collected that are "language dependent"; it was specifically asked for that
> > these should not narrowly dependent on what might be implemented by TeX or
> > in the LaTeX3 project.
>
> o.k., but probably we have made different assumptions of what Frank
> is looking for. Let's him decide. Anyway, by suggesting things that
> are near the boundary of what is relevant and what is not, you have
> raised the issue of where the boundary is... And discussing this, I
> think, is even more relevant that cataloguing what might change when
> switching languages!

I do agree that those boundaries and where those boundaries are will
bring out fruitful new insight as I see it. The obvious items like
hyphenation are clear to all of us --- and they are handled by babel
(as well as some less obvious ones) and if that is the only thing
there is to it we could directly jump to a discussion of whether or
not the interfaces of babel are at the right level or deficient in
some respects etc. But i think that the contributions so far have
already shown that there are some items that are at least worth
thinking about; how they would fit into a general scheme (they might be
controversial and might never find their way into any implementation but
that's a second step, they still might influence later discussions on
interfaces)

what i do find less important is the question whether or not a certain
convention/format is widely used. This is interesting information in
its own right but it is less important for the ultimative goal i have
in mind: developing an interface that allows setting such items on
various levels (and reviewing the existing interfaces)

in other words, take the string generating commands as an example. i
think we do agree that the words produced say by \today (if today
produces any words) might need to be changed if we change language
(another vague thing yet, but anyway). I think we also agree that
there might me more than one possible format (which suggests the need
for certain flexibility in the specification of such a format(s) ...)
for such words and/or on a higher level the format of the comeplte
string produced by \today.  but i'm personally not very interested to
discuss say the German DIN format and whether it is important, the
only one, one of many etc. Why not? because i don't intend to write a
package that provides one (or even the) "german language
style". Instead what i intend is to define interfaces allowing to
specify such information (and not even necessarily in form of style
sheets --- that is the approach taken by babel right now and later in
the disucssion we do need to analyse whether or not this is the right
level of abstraction)

> >   Generally, I think there is not a clear boundary between actually
> > providing automatic language translation, an unsolved problem, far beyond
> > what one could achieve in the LaTeX3 project, and "language" customizing
> > say quotes: There will be a scale of colours, rather than a black/white
> > clear cut situation. Basically, what is needed, is getting hold of some
> > semantic information, otherwise not resent in the actual typeset output,
> > which is entered in the typing process. How much, depends on what is
> > practically feasible.
>
> I am very much in favour of including semantic information in a
> source, but up to a limit. In my opinion
> that limit is not set by practicality of implementation, but
> practicality of use. For documents that are not going to be revised
> or reformated, including semantic information usually implies a cost
> with a very small return... that's why so many people use Word et al.
> in office environments. If one is writing an article, a book or a
> manual the cost/benefit changes in favour of generalized markup.

both remarks (by Hans and Pedro) are important and we will most likely get
back to them once we start analysing a little bit what the different
items have in common and on what level(s) one might want to be able to
change one or the other (one such level might be that one doesn't want
to mark them explicitly at all) --- don't get me wrong here. when i
talk about levels i don't mean that the items we have collected (and
might collect) could be classified uniquely. This will not be the
case, but nevertheless we might be able to determine certain such
levels (eg language within language in contrast to background language
...)

however before going into discussing anything like that i would like
to simply throw a scratch document onto you containing collected
material from the discussion so far (it may not contain everything you
found important so point out everything that you feel was neglected)
the document ends with some fragments on "language" copied from a
different source which might or might not be relevant. this document
is not so much intended for discussion (as it is mostly a collection
of mail fragments form this list) but just to put all these little
items together for better reference

CLEARLY: if you look the items over and you find further ones that are
not included but fall into the category of a) being language dependent
or b) are on the boundary of being so --- don't hesitate to voice
them. even if i now would like to start to carry the discussion
further any additions are still welcome

====================

as a next step i would like to focuss for a while on what we mean by
language or what we could mean by that --- i hope that the collection
will provide us with some starting point here. let me finish with
a further citation from the email between Hans and Pedro on that subject:

> > >In the context of LaTeX3 I do not think we should worry too much
> > >about what is specific to a language, but invariant within the
> > >language. Such cases could be handled by language packages.  Of
> > >course the hooks should be built into LaTeX3 so that language
> > >packages can do the costumization without trouble (and so survive
> > >across minor releases of LaTeX).
> >
> >   The problem is to define what a language really is, which is why it is
> > interesting collecting features that varies with languages, or dialects, or
> > subcultures, or groups of people crossing such boundaries.
>
> I would refrase this to: the problem is to find out which design and
> formatting issues are so tightly linked to language use, that we
> cannot pretend that they are independent.

=======================

so i guess that's it for tonight from me. sorry for sending such a
long mail; and do me (and my phone bill :-) one favour: don't reply to it
putting > in front of every line and then adding a comment of two or
three lines at the end or in the middle

good night
frank

---- snip -------

\documentclass{article}

\usepackage{shortvrb}

\MakeShortVerb{\|}

\begin{document}

\section{language dependent items}

\subsection{according to a babel bof}

\begin{itemize}
\item hyphenation patterns and associated left- and righthyphenmin
\item fontencoding (outputencoding)
\item direction of writing
\item input encoding
\item punctuation
\item quotation marks
\item captions and dates (perhaps several formats of dates)
\item mathematics (ie |\tan| gives either tan or tg)
\item typographic conventions
\item enumerating
\item ligatures
\item hyphen split (see article from Jiri Zlatuska)
\item collating sequence (|\alph| etc.
\item (conventions for emphasis, but more for document class,
publishing house conventions).
\end{itemize}

\subsection{from ltx-list discussions}

\begin{itemize}
representation of numbers, certain text strings, positioning and
spacing or even fonts

\item the typeset area and it positioning might depend on regional
traditions (but even if not one might want easy ways to modify
them)

\item commands that produce textual strings in some form depend on the
"language" used in the document even if their replacement text is
not unique, eg even if there is more than one possible resolution
eg the \verb=\today= example

\item the use of hyphenation depends on the language (probably :-) ---
just as an aside: in TeX the |\patterns| to use do not depend on the
language but rather on a "language/font-encoding" pair which is an
unfortunate fact of life not yet really taken care of

\item the use of fonts might depend on the language at least for
high-quality typesetting. why? because different languages have
different distributions of letters and so the gray value can change
drastically if you use fonts from language to language using the
same font (something that might suggest using fonts with large or
small x-heights for certain tasks, etc). Or take the question of
positioning the diacriticals. german.sty goes a long way to move
the umlaut (of the cm fonts) into a special position "suitable" for
German language

\item another thing where TeX is very bad at: the non-use of ligatures.
this is culture/language dependent even if with MS-word and friends
we might soon only have documents always without them

\item the position and use of punctuation marks, eg what quotes do we use
do we put the comma inside or outside the quote

-\item How do we typeset quotations. In some Danish books from the
seventies, I have seen quotations typeset as
\begin{verbatim}
>>Text. Text. Text. Text. Text. Text.
>>Text. Text. Text. Text. Text. Text.
>>Text. Text. Text.<<
\end{verbatim}

\item How do typeset dashes. What do we use for punctuation, number
and hyphen dashes?

\item The question of what to use as first and second order quotation
marks seems to be language related: In US English (and Swedish),
quotes are nested as And then he said foo bar', ... '' whereas in
UK English, it is And then he said how bad'', ... ' I think.

\item In US English, the number 1e9 is typeset as one billion'',
whereas in UK English, it is typeset as one milliard''. (After the
French revolution, the metric system, and the system with
milliard'' was invented, and the British, as the Swedes, started
using that; later the French switched back to the original system,
the used in the US.)

\item In principle, one could think of special commands for cardinal
numbers; one might the use the source code, to see which number was
intended. :-)

\item Some languges, like Spanish, start exclamations and questions
with an up-side-down interpunctuation mark. So one could think of
this as a language dependent feature; one enters (logically)
|\Exclamation{Foo}|, or |\Question{Bar}|, and the language package
inserts the correct interpunctuation.

\item In Swedish decimal numbers, the use of ,'' and .'' are
reversed relative English, so a number that would appear as
123,456.78'' in English, would be 123.456,78'' in Swedish. So,
this could be considered as a language dependent feature; one
enters (logically) |\Number{..}|, and the language package selects
the correct output format.

\item last two no good?

\item transcribation

The "transcribation" are systems like transcribing Russian letters
into English letters, but also translating German umlaut, and
Swedish special letters into "oe", "aa", and the like.

If you have a bibliographic database, it turns out to be
inconvenient remembering all those transcribed diacritical marks, so
for that purpose, one would want to have a simplified English (or
ASCII) transcribation. For example, the German and Swedish
diacritical marks would simply be dropped.  (This is better
appreciated when doing searches in a language where you do not know
what those diacritical martks mean; it is very difficult getting
them right.)

\item alphabetic search of one language in another

There is one item for each pair of languages, but what I have in
mind is that a language package should supply where the language
transcribed to, and alphabetic search is being done in, is English.

\item The bullets of the itemize environment (afaik, french.sty

\item The counterstyles in the enumerate environment The current
interface to those is clumsy, at least.

\item \ldots
\end{itemize}

\begin{itemize}
\item what is a language''?

\item what needs to be "settable" no matter how for the moment if a
document is written completely in language foo (note that i don't
say anything about settable to a single value)?

\item what would you like to be able to set (again not how at the moment)
if the document is set in language foo but some parts, say chapters
are in languages bar and some in baz

\item what needs to be adjusted when individual phrases or other small
items are set in a language different from the surrounding
language?
\end{itemize}

\section{Clarifying the term language''}

[ something we better do ]

From: "Pedro J. Aphalo"

In my opinion what we need is:
a) defaults that change according to language for things like \date.
b) easy customization of what may depend on design, especially within
different or the same "flavo(u)r" of a language. For example typing
quotation marks using commands.

In the context of LaTeX3 I do not think we should worry too much
about what is specific to a language, but invariant within the
language. Such cases could be handled by language packages.  Of
course the hooks should be built into LaTeX3 so that language
packages can do the costumization without trouble (and so survive
across minor releases of LaTeX).

What should be handled by LaTeX3 are the cases of things that may
vary both between and within languages (especially English).

From: Hans Aberg

The problem is to define what a language really is, which is why it is
interesting collecting features that varies with languages, or dialects, or
subcultures, or groups of people crossing such boundaries.

From: Pedro J. Aphalo

I would refrase this to: the problem is to find out which design and
formatting issues are so tightly linked to language use, that we
cannot pretend that they are independent.

\subsection{Language (from a paper on unicode by Chris\&Frank)}
\label{sec:lang}

Most of the transformations, relationships and structures discussed in
this paper would be informally described as being dependent on the
language in which the text is written.

Although this paper will not attempt to give a formal definition of
the concept language', it is nevertheless a concept central to almost
\cite[Section~5.3]{UU}).  Fortunately, for our purposes it is simply a
label; its importance is such that each bit of text must have a unique
language-label' attached to it.  This language-label' can be (and
usually is) used by each text application in order to define or
indicate what transformations are appropriate for that part of the
text.  It is also used by the formatter since it influences many
aspects of typesetting.
For mono-lingual documents this language-label' (and the necessary
environment)  may be provided as a
default in the set-up of the document processing system or,
increasingly, as part of the locale' set-up of the Operating System.
However, if the document contains text from different languages, such
defaults will not suffice.
Even in mono-lingual documents it is more robust to add the language
label since often only part of the document will be moved between
applications (eg as a result of a cut-and-paste operation) and the
applications may be running on different machines and hence
potentially under a different locale.

The following algorithms and applications, amongst others, need to
use information obtained by first interrogating this language-label'.

\begin{itemize}
\item finding sub-words;
\item the hyphenation rules;
\item the dictionaries and heuristics used by the spell-checker;
\item grammar- and style-checking;
\item the generated text used in various places: for example, the word
Chapitre' in the title \textbf{Chapitre 10: Quelle langue?};
\item which aesthetic ligatures should be used when the necessary glyphs
are available in the font being used;
\item the use of alternate forms (ligatures) that are dependent on the
logical context (but independent of the font);
\item the collating sequence used and other aspects of sorting for
databases and ordered lists such as indexes, bibliographies.
\end{itemize}

Some applications also use the language-label to specify exactly what
character-codes to expect within text with that language-label; for
example, the system may wish to issue an error-message or a warning
when an unexpected character appears.

As many of the transformations above suggest, this concept of a single
non-structured language-label is not completely congruent with more
general uses of the word language'; it is a more detailed
identification than that referenced in the last paragraph
of \cite[Section~5.3]{UU}.
For example, to encode the use of different hyphenation rules for the
same written language would require different language-labels; more
obviously, if the same spoken language can be written with distinct
scripts, each of these will require a separate language-label. The
concept is, however, at the
right
level of abstraction for an
application-independent representation and it can be used as an
efficient encoding of (or pointer to) information that is required by
a large range of text applications.

\end{document}