LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Classic View

Use Proportional Font
Show HTML Part by Default
Show All Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
David Carlisle <[log in to unmask]>
Tue, 17 Jun 1997 13:50:33 +0100
text/plain (90 lines)
> I've got some problem with inputenc package.

Your problem is mainly due to badly set up encoding specific commands.

> We use two ways to translate the input encoding (koi8-r) into TeX's internal encoding
> (LCY):

The main idea of the inputenc/fontenc system is that you do *not*
translate directly from the input encoding to the font encoding.
LaTeX's internal encoding for all such constructs should be a portable
7-bit form such that it may be read back at some other part of the
document (perhaps via the .aux file) where a potentially different
encoding is in force.

So for example in French you may type the latin-1 character <e-acute>
but that will be translated by inputenc to \'{e} and passed in that
form to the aux file etc, and finally when typeset converted to the
convention of the font encoding in force at that time, which may be
Cork (T1) in which case it will essentially be converted back to the
original character, or it may be OT1 in which case it will use \accent
or whatever. Note that the *same* input text may be converted in two
different ways, a heading might be typeset in T1 in the table of
contents, but in OT1 in the display heading (perhaps in a special font
not available with the composite letters needed to encode to T1).

In your case your input has been directly converted to the final
font encoding. When read back from the .aux file, these codes are
assumed to be an input encoding and so everything breaks.

> \DeclareInputText{"0E1}{\CYRA}

That is OK but any command used as an input text must be ***robust***
and in your case you want it to be specific to koi8 (or alt, or LCY,
or whatever)


> (because the definitions of russian letters in Babel package
> have the form \def\CYRV {^^82}).

these are incompatible with inputenc, you want an encoding specific,
and in particular, robust, definition,


(compare the definitions of \ae \ss and friends in t1enc.def.

Note that any mechanism that makes `high' characters of type `letter'
rather than `active' pays a very high price. It then forces the input
and font encodings to be the same. This means that any document, or
package or macro set written by a user of say KOI8 encoded fonts can
not use any macros written by users of other Cyrillic font encodings,
so the TeX community is fragmented and portability is destroyed.
If PC users had insisted on having catcode 11 slots so that they could
use \ss or \'e in macro names, then any such macros would have been
unuseable (or unreadable) on Macintosh, unix, Windows, ...
Fortunately they did not do this and they restriced to a portable
7bit set for command names. I strongly urge Russian users to do the
same. I accept that it is easier to do without \ss in German
than it is to do without the Cyrillic alphabet in Russian. There may
be possibilities to improve this situation, but I am just trying to
explain some of the thinking behind the current design, and the
reasons why the current mechanisms do not support the use of 8bit
characters in command names.

In an earlier message you suggested the use of emtex style code pages
for this kind of translation. The main argument against these is not
that they are emtex specific, but rather than they are bad in
principle as they totally break document portability. The document is
designed to run just with one specific input filter, but it carries
no information about this requirement, so is likely to break, even at
another emtex site. A similar feature was considered for web2c7 (see
some messages on this list earlier this year) but fortunately these
arguments about portability persuaded Karl not to enable the feature.
The second argument against these is that being essentially external to
TeX, they force the same encoding to be used throughout a document. As
inputenc is integrated with the TeX macro layer the input encoding
may be changed at arbitrary points in the document. Of course inputenc
pays a price in terms of speed. In the common case where just one
input encoding, and just one font encoding are used, it may be
possible to speed up the process by `freezing' the definitions,
cutting out the internal 7bit form. We have some experiments in that
direction, but the first thing is to get inputenc *working* then we
can discuss how to speed it up!