On Sat, 14 Jun 1997, Vladimir Volovich wrote: > > They use the default \lccode and > > \uccode layout. It will not solve all problems with languages using > > the Cyrillic script (and extensions of it), but at least you can avoid bad > > hyphenation. > > Yes, the T2-encodins seems to be perspective, but... > The problem is that this proposed encoding does not correspond to > the currently widely used (in TeX documents) Russian encodings. > The most popular encoding used in russian TeX documents is currently a > DOS cp866, because the most popular Russian fonts (LH fonts and > fonts developed by P.V.Ganelin and A.Shen) use this encoding. Well, I see no difficulty to add a T2 mapping to the LH fonts; they already have a few mapping tables. But remember that e.g. a Georgian user will not be happy with LH's default font encoding since the `hard ghe' is missing... > We also tried to use inputenc package, for example, to process > documents in a KOI-8 encoding. This works, but has some limitations. > E.g. when one uses inputenc, there is no possibility to use > the characters which are being translated as a names of macros, > because these characters become active. Also, there are some problems > with AUX files. ??? I can't follow you here. The only approach which will work is to separate input and output encoding, i.e. you map a input character to a character macro if the character is >= 0x80 as it is done for T1 encoding: [T2 encoding: the Russian A is on 0xC1, and the Russian a is on 0xE1] two approaches: 1) you say \DeclareTextSymbol{\RUSA}{T2}{"0C1} \DeclareTextSymbol{\rusa}{T2}{"0E1} (repeating this for all other Russian characters); then you add all such-defined characters to \@uclclist so that \MakeUppercase and \MakeLowercase work -- \lowercase and \uppercase should *never* be used directly! \begingroup \expandafter\toks@\expandafter{\@uclclist}% \toks@\expandafter{\the\toks@ \rusa\RUSA\rusb\RUSB...}% \expandafter\gdef\expandafter\@uclclist\expandafter{\the\toks@} \endgroup 2) you define dummy character accents to avoid modifying \@uclclist for so many characters: \DeclareTextCommand{\cyra}{T2}[1] {\PackageError{T2enc}{You can't use the \string\cyra command directly}#1} \DeclareTextCompositeCommand{\cyra}{T2}{A}{\char "0C1} \DeclareTextCompositeCommand{\cyra}{T2}{a}{\char "0E1} ... Now you can map in an input encoding the Russian A to \cyra{A} and a to \cyra{a}. Both approaches work well (the former I've used in my vncmr package for Vietnamese to define an ET5 encoding, the latter for a experimental LLW encoding using the `fil' option of the LH fonts to get more characters) for *all* encodings in the range 0x80-0xFF since the interface used here for TeX is only 7bit, and \uccode and \lccode for characters >= 0x80 will be never used. But the need for T2 is definitely here since 0x80-0xFF is not sufficient for all Cyrillic characters, and the characters in the range 0x00-0x7F *must* follow the default \lccode and \uccode values. > BTW, it is interesting to know the opinion of members of this list about the following: > not long ago Donald Knuth said that he is against any attempts > to change Computer Modern fonts (this happened in one of TeX distributions, > probably teTeX, where they changed CM fonts so, that metric files changed). > But one of the popular russian fonts for TeX are also based on the idea of > changing CM fonts: this fonts replace some files in such a way, that > the resulting fonts are called cm*, but they contain also all russian letters. > These fonts do not change anything which corresponds to the original letters > contained in CM fonts. Don't do this! I had the same problem with my Vietnamese fonts. There is a simple solution: if you need the original cm* macros, then do the following to get new names: e.g. you need cmb10.mf: then call your font rusb10.mf with the following contents: if unknown cmbase: input cmbase fi def generate = enddef; def roman = enddef; input cmb10 input rusroman cmb10.mf will be read, but the command `generate roman' will be ignored so that you can load your own definitions afterwards (contained in rusroman.mf). Werner