## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Proportional Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

 Subject: Re: inputenc for XeTeX and LuaTeX From: Will Robertson <[log in to unmask]> Reply To: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Wed, 18 Mar 2009 13:37:04 +1030 Content-Type: multipart/signed Parts/Attachments: text/plain (3865 bytes) , smime.p7s (2446 bytes)
```On 18/03/2009, at 9:19 AM, Frank Mittelbach wrote:

> Manuel Pégourié-Gonnard writes:
>> James Cloos a écrit :
>>> As for utf-8 or other, it may be useful to default to the
>>> character set
>>> specified for the current \$LOCALE.  Maybe. :-/
>>>
>> Please don't do anything in the compilation of the document depend
>> on the
>> locale! It would completely ruin portability of the source files.
>
> perhaps. it might be a straight path into long-term disaster.On the
> other
> hand the whole area is a disaster in the first place. When we
> started out with
> inputenc in 2e I also thought that it is really good to keep the
> encoding with
> the file (which you do by stating \usepackage[latin1]{inputenc} and
> the like)
> and that worked for a while fairly good. But then OSes started to
> convert on
> the fly so by cut-n-paste sometimes even on the same machine an old
> latin1 got
> translated into something else (except for the string specifying the
> encoding
> inside)... so ... not easy really

Yep, agreed that dealing with encodings is annoying :)

>> A file must be assumed to be either utf-8 (auxiliary file written by
>> XeTeX/LuaTeX) or in the encoding declared as the option of
>> inputenc. Exactly
>> what xetex-inputenc and luatex-inputenc do.
>>
>> The difficult problem is to guess when a file is an auxiliary file.
>> I suppose
>> the heuristics for doing so will improve when the solution gets
>> tested.
>
> how much guessing is really needed? Are you targetting an existing
> 2e env
> unchanged or are you intending to design an interface that is robust
> if used?
> Or something inbetween?

Almost entirely the first.

Neither package needs to guess anything; the problem is that there's
just no way to know if \input refers to a generated file or a user file.

The XeTeX solution simply patches \@input. The LuaTeX solution does
something similar and allows customisation so that certain files or
file extensions can be treated as if they were \@input rather than
\input.

> - new solution, ie not for 2e as such: design a proper interface for
> handling
>   internal auxilary file reading and writing. That would then have
> hooks to
>   maintain encoding. We certainly have to do something along those
> lines for expl3

Yep.

> - partial 2e solution: use \@input as a proposed way to read
> internal files
>   back in (as suggested by Will) and handle those correctly. booh at
> those
>   packages that don't use \@input but \input for their internal
> files (which
>   is is already wrong in 2e proper) and ask them to change or ignore
> them.

Yep. I hadn't thought of it before, but we could add a note to the
documentation explicitly discussing this behaviour. Using \@input for
internally-generated files is implicit in what it does but there's no-
where (that I know of) that states it plainly.

Note that even the kernel uses \input on the .aux file somewhere :)

> - possible 2e solution: steal \openout to always write
>   \InternallyWrittenFileHookToHandleWhatWeNeedToHandle
>   to the top of each such file; fix the cases where this is not
> appropriate
>   in 2e, such as filecontents env ... and wait for the packages to
> blow up
>   and fix those (probably only a few if any)

Nice idea, probably will work; but the return on investment is too low
(for me at least). I expect non-UTF8 input in Xe(La)TeX documents to
be hardly ever used. And we can always foist off the responsibility on
the packages that don't work because of \input v. \@input.

* * *

So, assuming we want to do something about the whole situation (I hope
so), how open are you to the idea of adding branching to inputenc to
load packages that aren't under the LaTeX team's control? I'm more
than happy printing a big warning telling users what's going on.