Pedro J. Aphalo writes: > I am just replying to you, not the list now. you are mistaken you did reply to the list :-) > On 18 Feb 97 at 13:42, Hans Aberg wrote: > > > "Pedro J. Aphalo" <[log in to unmask]> wrote: > > > > > The reason I put these up, was that it was asked for items to be > > collected that are "language dependent"; it was specifically asked for that > > these should not narrowly dependent on what might be implemented by TeX or > > in the LaTeX3 project. > > o.k., but probably we have made different assumptions of what Frank > is looking for. Let's him decide. Anyway, by suggesting things that > are near the boundary of what is relevant and what is not, you have > raised the issue of where the boundary is... And discussing this, I > think, is even more relevant that cataloguing what might change when > switching languages! I do agree that those boundaries and where those boundaries are will bring out fruitful new insight as I see it. The obvious items like hyphenation are clear to all of us --- and they are handled by babel (as well as some less obvious ones) and if that is the only thing there is to it we could directly jump to a discussion of whether or not the interfaces of babel are at the right level or deficient in some respects etc. But i think that the contributions so far have already shown that there are some items that are at least worth thinking about; how they would fit into a general scheme (they might be controversial and might never find their way into any implementation but that's a second step, they still might influence later discussions on interfaces) what i do find less important is the question whether or not a certain convention/format is widely used. This is interesting information in its own right but it is less important for the ultimative goal i have in mind: developing an interface that allows setting such items on various levels (and reviewing the existing interfaces) in other words, take the string generating commands as an example. i think we do agree that the words produced say by \today (if today produces any words) might need to be changed if we change language (another vague thing yet, but anyway). I think we also agree that there might me more than one possible format (which suggests the need for certain flexibility in the specification of such a format(s) ...) for such words and/or on a higher level the format of the comeplte string produced by \today. but i'm personally not very interested to discuss say the German DIN format and whether it is important, the only one, one of many etc. Why not? because i don't intend to write a package that provides one (or even the) "german language style". Instead what i intend is to define interfaces allowing to specify such information (and not even necessarily in form of style sheets --- that is the approach taken by babel right now and later in the disucssion we do need to analyse whether or not this is the right level of abstraction) > > Generally, I think there is not a clear boundary between actually > > providing automatic language translation, an unsolved problem, far beyond > > what one could achieve in the LaTeX3 project, and "language" customizing > > say quotes: There will be a scale of colours, rather than a black/white > > clear cut situation. Basically, what is needed, is getting hold of some > > semantic information, otherwise not resent in the actual typeset output, > > which is entered in the typing process. How much, depends on what is > > practically feasible. > > I am very much in favour of including semantic information in a > source, but up to a limit. In my opinion > that limit is not set by practicality of implementation, but > practicality of use. For documents that are not going to be revised > or reformated, including semantic information usually implies a cost > with a very small return... that's why so many people use Word et al. > in office environments. If one is writing an article, a book or a > manual the cost/benefit changes in favour of generalized markup. both remarks (by Hans and Pedro) are important and we will most likely get back to them once we start analysing a little bit what the different items have in common and on what level(s) one might want to be able to change one or the other (one such level might be that one doesn't want to mark them explicitly at all) --- don't get me wrong here. when i talk about levels i don't mean that the items we have collected (and might collect) could be classified uniquely. This will not be the case, but nevertheless we might be able to determine certain such levels (eg language within language in contrast to background language ...) however before going into discussing anything like that i would like to simply throw a scratch document onto you containing collected material from the discussion so far (it may not contain everything you found important so point out everything that you feel was neglected) the document ends with some fragments on "language" copied from a different source which might or might not be relevant. this document is not so much intended for discussion (as it is mostly a collection of mail fragments form this list) but just to put all these little items together for better reference CLEARLY: if you look the items over and you find further ones that are not included but fall into the category of a) being language dependent or b) are on the boundary of being so --- don't hesitate to voice them. even if i now would like to start to carry the discussion further any additions are still welcome ==================== as a next step i would like to focuss for a while on what we mean by language or what we could mean by that --- i hope that the collection will provide us with some starting point here. let me finish with a further citation from the email between Hans and Pedro on that subject: > > >In the context of LaTeX3 I do not think we should worry too much > > >about what is specific to a language, but invariant within the > > >language. Such cases could be handled by language packages. Of > > >course the hooks should be built into LaTeX3 so that language > > >packages can do the costumization without trouble (and so survive > > >across minor releases of LaTeX). > > > > The problem is to define what a language really is, which is why it is > > interesting collecting features that varies with languages, or dialects, or > > subcultures, or groups of people crossing such boundaries. > > I would refrase this to: the problem is to find out which design and > formatting issues are so tightly linked to language use, that we > cannot pretend that they are independent. ======================= so i guess that's it for tonight from me. sorry for sending such a long mail; and do me (and my phone bill :-) one favour: don't reply to it putting > in front of every line and then adding a comment of two or three lines at the end or in the middle good night frank ---- snip ------- \documentclass{article} \usepackage{shortvrb} \MakeShortVerb{\|} \begin{document} \section{language dependent items} \subsection{according to a babel bof} \begin{itemize} \item hyphenation patterns and associated left- and righthyphenmin \item fontencoding (outputencoding) \item direction of writing \item input encoding \item punctuation \item quotation marks \item captions and dates (perhaps several formats of dates) \item mathematics (ie |\tan| gives either tan or tg) \item typographic conventions \item enumerating \item ligatures \item hyphen split (see article from Jiri Zlatuska) \item collating sequence (|\alph| etc. \item (conventions for emphasis, but more for document class, publishing house conventions). \end{itemize} \subsection{from ltx-list discussions} \begin{itemize} \item for headings there exists certain traditions that require additional flexibility when specifying headings such as representation of numbers, certain text strings, positioning and spacing or even fonts \item the typeset area and it positioning might depend on regional traditions (but even if not one might want easy ways to modify them) \item commands that produce textual strings in some form depend on the "language" used in the document even if their replacement text is not unique, eg even if there is more than one possible resolution eg the \verb=\today= example \item the use of hyphenation depends on the language (probably :-) --- just as an aside: in TeX the |\patterns| to use do not depend on the language but rather on a "language/font-encoding" pair which is an unfortunate fact of life not yet really taken care of \item the use of fonts might depend on the language at least for high-quality typesetting. why? because different languages have different distributions of letters and so the gray value can change drastically if you use fonts from language to language using the same font (something that might suggest using fonts with large or small x-heights for certain tasks, etc). Or take the question of positioning the diacriticals. german.sty goes a long way to move the umlaut (of the cm fonts) into a special position "suitable" for German language \item another thing where TeX is very bad at: the non-use of ligatures. this is culture/language dependent even if with MS-word and friends we might soon only have documents always without them \item the position and use of punctuation marks, eg what quotes do we use do we put the comma inside or outside the quote -\item How do we typeset quotations. In some Danish books from the seventies, I have seen quotations typeset as \begin{verbatim} >>Text. Text. Text. Text. Text. Text. >>Text. Text. Text. Text. Text. Text. >>Text. Text. Text.<< \end{verbatim} \item How do typeset dashes. What do we use for punctuation, number and hyphen dashes? \item The question of what to use as first and second order quotation marks seems to be language related: In US English (and Swedish), quotes are nested as ``And then he said `foo bar', ... '' whereas in UK English, it is `And then he said ``how bad'', ... ' I think. \item In US English, the number 1e9 is typeset as ``one billion'', whereas in UK English, it is typeset as ``one milliard''. (After the French revolution, the metric system, and the system with ``milliard'' was invented, and the British, as the Swedes, started using that; later the French switched back to the original system, the used in the US.) \item In principle, one could think of special commands for cardinal numbers; one might the use the source code, to see which number was intended. :-) \item Some languges, like Spanish, start exclamations and questions with an up-side-down interpunctuation mark. So one could think of this as a language dependent feature; one enters (logically) |\Exclamation{Foo}|, or |\Question{Bar}|, and the language package inserts the correct interpunctuation. \item In Swedish decimal numbers, the use of ``,'' and ``.'' are reversed relative English, so a number that would appear as ``123,456.78'' in English, would be ``123.456,78'' in Swedish. So, this could be considered as a language dependent feature; one enters (logically) |\Number{..}|, and the language package selects the correct output format. \item last two no good? \item transcribation The "transcribation" are systems like transcribing Russian letters into English letters, but also translating German umlaut, and Swedish special letters into "oe", "aa", and the like. If you have a bibliographic database, it turns out to be inconvenient remembering all those transcribed diacritical marks, so for that purpose, one would want to have a simplified English (or ASCII) transcribation. For example, the German and Swedish diacritical marks would simply be dropped. (This is better appreciated when doing searches in a language where you do not know what those diacritical martks mean; it is very difficult getting them right.) \item alphabetic search of one language in another There is one item for each pair of languages, but what I have in mind is that a language package should supply where the language transcribed to, and alphabetic search is being done in, is English. \item The bullets of the itemize environment (afaik, french.sty already does this) \item The counterstyles in the enumerate environment The current interface to those is clumsy, at least. \item \ldots \end{itemize} \section{Questions to be asked} \begin{itemize} \item what is a ``language''? \item what needs to be "settable" no matter how for the moment if a document is written completely in language foo (note that i don't say anything about settable to a single value)? \item what would you like to be able to set (again not how at the moment) if the document is set in language foo but some parts, say chapters are in languages bar and some in baz \item what needs to be adjusted when individual phrases or other small items are set in a language different from the surrounding language? \end{itemize} \section{Clarifying the term ``language''} [ something we better do ] \section{misc comments} From: "Pedro J. Aphalo" In my opinion what we need is: a) defaults that change according to language for things like \date. b) easy customization of what may depend on design, especially within different or the same "flavo(u)r" of a language. For example typing quotation marks using commands. In the context of LaTeX3 I do not think we should worry too much about what is specific to a language, but invariant within the language. Such cases could be handled by language packages. Of course the hooks should be built into LaTeX3 so that language packages can do the costumization without trouble (and so survive across minor releases of LaTeX). What should be handled by LaTeX3 are the cases of things that may vary both between and within languages (especially English). From: Hans Aberg The problem is to define what a language really is, which is why it is interesting collecting features that varies with languages, or dialects, or subcultures, or groups of people crossing such boundaries. From: Pedro J. Aphalo I would refrase this to: the problem is to find out which design and formatting issues are so tightly linked to language use, that we cannot pretend that they are independent. \subsection{Language (from a paper on unicode by Chris\&Frank)} \label{sec:lang} Most of the transformations, relationships and structures discussed in this paper would be informally described as being dependent on the language in which the text is written. Although this paper will not attempt to give a formal definition of the concept `language', it is nevertheless a concept central to almost all the text applications considered above (see also comments in \cite[Section~5.3]{UU}). Fortunately, for our purposes it is simply a label; its importance is such that each bit of text must have a unique `language-label' attached to it. This `language-label' can be (and usually is) used by each text application in order to define or indicate what transformations are appropriate for that part of the text. It is also used by the formatter since it influences many aspects of typesetting. For mono-lingual documents this `language-label' (and the necessary environment) may be provided as a default in the set-up of the document processing system or, increasingly, as part of the `locale' set-up of the Operating System. However, if the document contains text from different languages, such defaults will not suffice. Even in mono-lingual documents it is more robust to add the language label since often only part of the document will be moved between applications (eg as a result of a cut-and-paste operation) and the applications may be running on different machines and hence potentially under a different locale. The following algorithms and applications, amongst others, need to use information obtained by first interrogating this `language-label'. \begin{itemize} \item finding sub-words; \item the hyphenation rules; \item the dictionaries and heuristics used by the spell-checker; \item grammar- and style-checking; \item the generated text used in various places: for example, the word `Chapitre' in the title \textbf{Chapitre 10: Quelle langue?}; \item which aesthetic ligatures should be used when the necessary glyphs are available in the font being used; \item the use of alternate forms (ligatures) that are dependent on the logical context (but independent of the font); \item the collating sequence used and other aspects of sorting for databases and ordered lists such as indexes, bibliographies. \end{itemize} Some applications also use the language-label to specify exactly what character-codes to expect within text with that language-label; for example, the system may wish to issue an error-message or a warning when an unexpected character appears. As many of the transformations above suggest, this concept of a single non-structured language-label is not completely congruent with more general uses of the word `language'; it is a more detailed identification than that referenced in the last paragraph of \cite[Section~5.3]{UU}. For example, to encode the use of different hyphenation rules for the same written language would require different language-labels; more obviously, if the same spoken language can be written with distinct scripts, each of these will require a separate language-label. The concept is, however, at the right level of abstraction for an application-independent representation and it can be used as an efficient encoding of (or pointer to) information that is required by a large range of text applications. \end{document}