Mon, 16 Mar 2009 17:23:00 +1030
|
Hello,
As TeX Live 2009 approaches (too quickly) I'd like to again raise the
problems of inputenc in XeTeX and LuaTeX. Sorry to add to the
distractions of the members of the LaTeX3 team who are also busy
working on the expl3 code...
The problem is clear: LaTeX assumes 8-bit encodings and clobbers the
use of true UTF8 when newer engines are being used. This is an issue
because "\usepackage[utf8]{inputenc}" is the first thing that many new
users do when moving their documents over to the new engine.
Elie Roux and I have been writing packages to attempt to work around
the problem for LuaTeX and XeTeX, respectively. Our development code
is here:
<http://github.com/eroux/lualatex/tree/master>
<http://github.com/wspr/xetex-inputenc/tree/master>
We'd like inputenc to branch to these packages in the appropriate
context.
* * *
The LuaTeX version is more robust, but both packages attempt to "do
the right thing" when they're loaded, keeping UTF8 untouched and
processing the other encodings as best they can. In the LuaTeX
package, the original inputenc is still loaded and used for
transforming the chars.
In the XeTeX package, I take the lazy way out and simply read the
files in the different encoding, bypassing inputenc entirely.
Neither package attempts to do anything with the macros defined by
inputenc.sty or utf8ienc.dtx, but the main one we'd be interested in,
\DeclareUnicodeCharacter, would be easy enough to re-write for
"unicodeTeX".
However, neither package is really suitable for maintenance by the
LaTeX3 Project. Elie's macros for supporting LuaTeX are obviously very
new; furthermore, there're a few intertwined packages in his code tree
that wouldn't be appropriate, yet, to add to the stable LaTeX2e code.
And the XeTeX version isn't robust enough because there's no way to
reliably detect or branch based on \input. (It only works for packages
that use \@input for auto-generated files.)
* * *
Therefore, my proposal is to remove the responsibility of inputenc
from the LaTeX2e code for XeTeX and LuaTeX. I'm not sure exactly the
best way to achieve this; I'm going to recommend hard-coding it into
inputenc since that's the simplest option for now. E.g., at the top of
inputenc.sty:
% check for XeTeX:
\begingroup\expandafter\expandafter\expandafter\endgroup
\expandafter\ifx\csname XeTeXversion\endcsname\relax\else
\IfFileExists{xetex-inputenc.sty}{%
\DeclareOption*{\PassOptionToPackage{\CurrentOption}{xetex-
inputenc}}
\ProcessOptions
\RequirePackage{xetex-inputenc}
\expandafter\endinput
}{}
\fi
and something equivalent for LuaTeX.
Then the LaTeX team can "ignore" the problem for now (since it's plain
that LaTeX2e will never natively support unicode), and leave the
problem in the hands of the developers who are actively writing macro
packages for the new engines.
Like I said above, I'd like to resolve this for TeXLive 2009, and with
such a simple patch to inputenc I think we can do it.
Will
|
|
|