LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mime-Version: 1.0 (Apple Message framework v930.3)
Sender: Mailing list for the LaTeX3 project <[log in to unmask]>
From: Will Robertson <[log in to unmask]>
Date: Mon, 16 Mar 2009 17:23:00 +1030
Content-Type: multipart/signed; boundary=Apple-Mail-6--966340416; micalg=sha1; protocol="application/pkcs7-signature"
Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments: text/plain (2966 bytes) , smime.p7s (2446 bytes)
Hello,

As TeX Live 2009 approaches (too quickly) I'd like to again raise the  
problems of inputenc in XeTeX and LuaTeX. Sorry to add to the  
distractions of the members of the LaTeX3 team who are also busy  
working on the expl3 code...

The problem is clear: LaTeX assumes 8-bit encodings and clobbers the  
use of true UTF8 when newer engines are being used. This is an issue  
because "\usepackage[utf8]{inputenc}" is the first thing that many new  
users do when moving their documents over to the new engine.

Elie Roux and I have been writing packages to attempt to work around  
the problem for LuaTeX and XeTeX, respectively. Our development code  
is here:
   <http://github.com/eroux/lualatex/tree/master>
   <http://github.com/wspr/xetex-inputenc/tree/master>

We'd like inputenc to branch to these packages in the appropriate  
context.

* * *

The LuaTeX version is more robust, but both packages attempt to "do  
the right thing" when they're loaded, keeping UTF8 untouched and  
processing the other encodings as best they can. In the LuaTeX  
package, the original inputenc is still loaded and used for  
transforming the chars.

In the XeTeX package, I take the lazy way out and simply read the  
files in the different encoding, bypassing inputenc entirely.

Neither package attempts to do anything with the macros defined by  
inputenc.sty or utf8ienc.dtx, but the main one we'd be interested in,  
\DeclareUnicodeCharacter, would be easy enough to re-write for  
"unicodeTeX".

However, neither package is really suitable for maintenance by the  
LaTeX3 Project. Elie's macros for supporting LuaTeX are obviously very  
new; furthermore, there're a few intertwined packages in his code tree  
that wouldn't be appropriate, yet, to add to the stable LaTeX2e code.  
And the XeTeX version isn't robust enough because there's no way to  
reliably detect or branch based on \input. (It only works for packages  
that use \@input for auto-generated files.)

* * *

Therefore, my proposal is to remove the responsibility of inputenc  
from the LaTeX2e code for XeTeX and LuaTeX. I'm not sure exactly the  
best way to achieve this; I'm going to recommend hard-coding it into  
inputenc since that's the simplest option for now. E.g., at the top of  
inputenc.sty:

	% check for XeTeX:
	\begingroup\expandafter\expandafter\expandafter\endgroup
	\expandafter\ifx\csname XeTeXversion\endcsname\relax\else
	  \IfFileExists{xetex-inputenc.sty}{%
		  \DeclareOption*{\PassOptionToPackage{\CurrentOption}{xetex- 
inputenc}}
		  \ProcessOptions
		  \RequirePackage{xetex-inputenc}
		  \expandafter\endinput
	  }{}
	\fi

and something equivalent for LuaTeX.

Then the LaTeX team can "ignore" the problem for now (since it's plain  
that LaTeX2e will never natively support unicode), and leave the  
problem in the hands of the developers who are actively writing macro  
packages for the new engines.

Like I said above, I'd like to resolve this for TeXLive 2009, and with  
such a simple patch to inputenc I think we can do it.

Will








ATOM RSS1 RSS2