LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mime-Version:
1.0
Content-Type:
text/plain; charset="iso-8859-1"
Date:
Mon, 28 Mar 2005 20:41:01 +0200
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:
From:
Lars Hellström <[log in to unmask]>
In-Reply-To:
Content-Transfer-Encoding:
8bit
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:
text/plain (151 lines)
At 22.36 +0200 2005-03-25, Jörg Knappen wrote:
>Since there are problems with typesetting lithuanian, latvian, sami and
>greenlandic with the T1 encoding, I have drafted another encoding for the
>baltic languages. Please look at the attached latex source file for
>details.

Very interesting!

Comments on things, as they appear in the file:


   \item[Swedish] \"A\"a \r{A}\r{a} \'E\'e \"O\"o

Yes, that's a reasonable set of characters (although anyone Swedish would
reorder them as \r{A}\r{a} \"A\"a ...). \'E\'e isn't a distinct letter, but
there are a couple of words where one usually makes use of it to clarify
the pronounciation ("ide" and "idé" have different meanings).


   {\notefont Note: According to my information, finnish has
   recently added \v{S}\v{s} and \v{Z}\v{z} to its alphabet.

Wikipedia at least seem to confirm that they should be available:
http://en.wikipedia.org/wiki/Finnish_alphabet


   \item[German] \"A\"a \"O\"o \SS\ss\ \"U\"u

Is an \SS character necessary? I don't mind having an \SS command, but
cannot its implementation just as well be to expand to a pair of `S's?
(It's worth noticing that neither Unicode nor the Adobe Glyph List has any
uppercase equivalent of \ss, so apparently most of the world gets by
perfectly well without such a character.)


   The characters \OE\ and \oe\ are part of the standard OT1
   encoding, they should be here, too. The same goes for the spanish
   inverted marks ?` and !`.

I disagree. Syntactic ligatures such as ?` and !` are generally a poor
solution, and since the characters are very periferal in this encoding it
is better to leave them out. Quoting TLC2 p. 443:

   While most fonts also implement !` and ?` to generate ¡ and ¿,
   this feature is not universally available in all fonts.


   There should be more ligatures, ...

Yes. In fact, the optimal set-up is probably that all slots not needed to
support the various languages are made available for ligatures.

   ... names an fh ligature, the fj ligature (\emph{fjord}), and
   the fk ligature (also present in the \textsf{QX} encoding
   supported by the Latin Modern fonts).

Not only such "collision" ligatures are valuable, but also "fancy"
ligatures such as st, ct, and end-of-word swashes. By leaving a healthy
fraction of the slots without mandated contents, the font designers are
given a chance to implement such features, if they wish to do so.


   If one disposes the perthousandzero

Yes, there is no need to include that bastard character.


   \section{More languages supported by the core encoding}
   This sections lists some languages which are supported by the
   core encoding without any additional characters needed.

I don't quite see the reason e.g. Northern Sami is listed as a core
language whereas Lule Sami is not; if the criterium for inclusion is
primarily geography, then I fail to see why one should be included but not
the other.


   {\notefont Note: \~N and \'N are considered variants of
   the same letter, allthough they are different in standard
   character sets.

   Since they are variants one can choose \'N as the representative
   for the T1B encoding.}

Additional support for this can be found in

  http://www.statskontoret.se/upload/2619/TN36.pdf

(a technical standard published by the Swedish Agency for Public
Management). It also states that \"\I and \"\i are not needed for Southern
Sami.


   \item The Unicode Standard \cite{Unicode40} has \v{H}\v{h}
   on positions U+021E and U+021F with the annotation
   ``Finnish Romany''. Additional information on this language
   and its alphabet is needed.

http://www.kotus.fi/kielet/romani/ind_inenglish.shtml  might be a good
place to start with this.


   \item Are there more ligatures to be considered --
   ffh (probably not), ffj (don't know, if someone needs this
   one), ffk (maybe yes)?

Beyond the matter of making sure that slots are available, this is not an
issue that needs to be codified in an encoding spec, as it can be different
from font to font. The same goes for

   \item Shall the ligatures be ``ON'' by default?

An issue though that it might be worth to think about is what character
should be the default \hyphenchar. I haven't ever seen the T1 hyphenchar in
slot 127 activated.


   What about \Gbar, \Tbar, and \Ezh? The \textsf{T4}
   encoding has established shorthands \verb+\B+ meaning ``with
   crossbar'' and \verb+\m+ meaning ``modified''. Adopt them
   for \textsf{T1B}?

I haven't any particular opinion about this, but whatever is done, it
should be done so that the commands can work "as expected" even when used
at times where the current encoding isn't T1B. A "modified" accent that is
only a shorthand sounds as though this could be a bit troublesome.

----

On more the more general issue of _how_ to specify encodings, your T1B
draft has arrived at an interesting point in time.

(As I have previously posted notices about on this list,) I have written a
paper (standard proposal) on this subject, which can currently be found in
CTAN as the file fonts/utilities/fontinst/doc/encspecs/encspecs.tex. Some
drafts for formal specifications can be found in the same directory. Your
t1b.tex is interestingly enough quite close to what would be in such a
formal specification if the actual encoding layout was removed, so you
should find it rather straightforward to make use of this format when you
advance on this subject. It would also be interesting (at least for me) to
see how the process might work in cases where one is not just trying to
write down some established practices, but actually creating something new.

Starting in February, there has also on the [log in to unmask] mailing
list been a discussion on the creation of an encoding repository; the
consensus now is that this should reside in the CTAN fonts/encodings
directory. Hence the general issue might finally have started to gain some
momentum.

Lars Hellström

ATOM RSS1 RSS2