On Sat, 14 Jun 1997, Vladimir Volovich wrote:
> > They use the default \lccode and
> > \uccode layout. It will not solve all problems with languages using
> > the Cyrillic script (and extensions of it), but at least you can avoid bad
> > hyphenation.
>
> Yes, the T2-encodins seems to be perspective, but...
> The problem is that this proposed encoding does not correspond to
> the currently widely used (in TeX documents) Russian encodings.
> The most popular encoding used in russian TeX documents is currently a
> DOS cp866, because the most popular Russian fonts (LH fonts and
> fonts developed by P.V.Ganelin and A.Shen) use this encoding.
Well, I see no difficulty to add a T2 mapping to the LH fonts; they
already have a few mapping tables. But remember that e.g. a Georgian user
will not be happy with LH's default font encoding since the `hard ghe' is
missing...
> We also tried to use inputenc package, for example, to process
> documents in a KOI-8 encoding. This works, but has some limitations.
> E.g. when one uses inputenc, there is no possibility to use
> the characters which are being translated as a names of macros,
> because these characters become active. Also, there are some problems
> with AUX files.
??? I can't follow you here. The only approach which will work is to
separate input and output encoding, i.e. you map a input character to a
character macro if the character is >= 0x80 as it is done for T1 encoding:
[T2 encoding: the Russian A is on 0xC1, and the Russian a is on 0xE1]
two approaches:
1) you say
\DeclareTextSymbol{\RUSA}{T2}{"0C1}
\DeclareTextSymbol{\rusa}{T2}{"0E1}
(repeating this for all other Russian characters); then you add all
such-defined characters to \@uclclist so that \MakeUppercase and
\MakeLowercase work -- \lowercase and \uppercase should *never* be used
directly!
\begingroup
\expandafter\toks@\expandafter{\@uclclist}%
\toks@\expandafter{\the\toks@
\rusa\RUSA\rusb\RUSB...}%
\expandafter\gdef\expandafter\@uclclist\expandafter{\the\toks@}
\endgroup
2) you define dummy character accents to avoid modifying \@uclclist for so
many characters:
\DeclareTextCommand{\cyra}{T2}[1]
{\PackageError{T2enc}{You can't use the \string\cyra command
directly}#1}
\DeclareTextCompositeCommand{\cyra}{T2}{A}{\char "0C1}
\DeclareTextCompositeCommand{\cyra}{T2}{a}{\char "0E1}
...
Now you can map in an input encoding the Russian A to \cyra{A} and a to
\cyra{a}.
Both approaches work well (the former I've used in my vncmr package for
Vietnamese to define an ET5 encoding, the latter for a experimental LLW
encoding using the `fil' option of the LH fonts to get more characters)
for *all* encodings in the range 0x80-0xFF since the interface used here
for TeX is only 7bit, and \uccode and \lccode for characters >= 0x80 will
be never used.
But the need for T2 is definitely here since 0x80-0xFF is not sufficient
for all Cyrillic characters, and the characters in the range 0x00-0x7F
*must* follow the default \lccode and \uccode values.
> BTW, it is interesting to know the opinion of members of this list about the following:
> not long ago Donald Knuth said that he is against any attempts
> to change Computer Modern fonts (this happened in one of TeX distributions,
> probably teTeX, where they changed CM fonts so, that metric files changed).
> But one of the popular russian fonts for TeX are also based on the idea of
> changing CM fonts: this fonts replace some files in such a way, that
> the resulting fonts are called cm*, but they contain also all russian letters.
> These fonts do not change anything which corresponds to the original letters
> contained in CM fonts.
Don't do this! I had the same problem with my Vietnamese fonts. There is a
simple solution: if you need the original cm* macros, then do the
following to get new names:
e.g. you need cmb10.mf: then call your font rusb10.mf with the following
contents:
if unknown cmbase:
input cmbase
fi
def generate = enddef;
def roman = enddef;
input cmb10
input rusroman
cmb10.mf will be read, but the command `generate roman' will be ignored so
that you can load your own definitions afterwards (contained in
rusroman.mf).
Werner
|