LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

On Mon, Jul 05, 2004 at 07:31:34AM +0200, Werner LEMBERG wrote:

> [LaTeX 2e 2003/12/01]
>
> Is the following a known limitation or a bug?  And if it is a
> limitation, where is it documented?
>
>   \documentclass{article}
>
>   \usepackage[latin3]{inputenc}
>
>   \begin{document}
>   \tableofcontents
>   \section{\'^^b9}
>   \end{document}
>
> ^^b9 is the dotless i in latin 3 -- in the TOC, the accent is
> formatted incorrectly.  BTW, it doesn't matter whether OT1 or T1 is
> used.

Package inputenc translates the input characters that it controls
into TeX code: ^^b9 becomes:
  \show^^b9
  ->\IeC {\i }
Actually 4 tokens instead of one ^^b9 token.

This goes into the .aux and .toc file:
  \contentsline {section}{\numberline {1}\'\IeC {\i }}{1}

The function of \IeC is that spaces after the character
are detected correctly:
  ^^b9 foobar     --> space between
  \i foobar       --> no space
  \IeC{\i} foobar --> space between

Because of the four tokens you need braces around such characters:
  \section{\'{^^b9}}

Of course it is possible to change the behaviour of inputenc:
The translation into TeX code is deferred in protecting environments,
so that the 8-bit character goes into the .aux and .toc file:
  \contentsline {section}{\numberline {1}\'^^b9}{1}

The disadvantage of this approach is, that the \section command
and \tableofcontents are processed at different times perhaps with
different input encodings. Then the wrong input encoding can
apply to the section title in the table of contents. Then changes
of the input encoding has to be recorded in the .toc file, too.

Yours sincerely
  Heiko <[log in to unmask]>