LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Forum View

Use Monospaced Font
Show HTML Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender:
Mailing list for the LaTeX3 project <[log in to unmask]>
Date:
Wed, 30 Apr 2014 09:56:18 +0100
Reply-To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Message-ID:
Content-Transfer-Encoding:
7bit
Subject:
From:
David Carlisle <[log in to unmask]>
Content-Type:
text/plain; charset=ISO-8859-1; format=flowed
In-Reply-To:
Organization:
NAG
MIME-Version:
1.0
Parts/Attachments:
text/plain (167 lines)
On 30/04/2014 05:15, Heiko Oberdiek wrote:
> On 29.04.2014 16:35, David Carlisle wrote:

>
> | inputenc package updates The inputenc package | allows different
> input encodings for LaTeX documents to | be specified, importantly
> the utf8 option to specify the | Unicode UTF-8 encoding. A common
> mistake has been | to use this option with Unicode-based TeX engines
>  | LuaTeX and XeTeX, however inputenc does not work | (and is not
> needed) with those systems. The package | has been modified so that
> if used with LuaTeX or | XeTeX, then it just issues a warning if
> utf8 is specified, | and stops with an error for any other encoding
> | requested.
>
> Encodings ascii.def and x-ascii.def
> ----------------------------------- A use case for these encodings
> is that a document should not contain problematic non-ASCII
> characters. Thus inputenc cries, if a 8-bit character or control
> characters is present. Since a valid document only contains 7-bit
> characters, this works very well with LuaTeX and XeTeX. IMHO,
> inputenc must not throw an error. Instead: * It should disable the
> 7-bit control characters of package inputenc as without LuaTeX/XeTeX.
> * The other characters in the Unicode range with codes > 127 are way
> too many for inputenc to handle. (Each character would have been made
> active and defined to throw an error.) IMHO the best approach is to
> ignore these characters. * Instead of a warning, an info message is
> enough, at most a warning, which says character with character codes
> > 127 are not handled by inputenc for LuaTeX/XeTeX.
>
> Encodings utf8.def and utf8x.def --------------------------------
> The current version knows `utf8.def` and ignores it with a warning.
> However, there are documents that are using utf8x.def of package
> ucs. The same arguments are valid here: * The document source is in
> UTF-8, more or less needed for XeTeX/LuaTeX. * Thus an error would be
> wrong and a warning is enough. That should also be the case for
> utf8x.def If someone uses a document for the old engines:
>
> \documentclass{article} \usepackage[utf8]{inputenc}
> \usepackage[T1]{fontenc} % or without \begin{document}
>
> and gets a warning, that utf8 is not required for LuaTeX/XeTeX, then
> he run into trouble (both with/without \usepackage[utf8]{inputenc}):
>  The characters with codes >127 are not mapped to the proper LICRs,
> but are using the slots of the font encoding OT1 or T1 (for example)
> with the consequences that characters are missing or wrong. Thus a
> hint in the warning of inputenc should be added, that package
> fontspec might be useful.
>
> Yours sincerely Heiko Oberdiek
>

Heiko thanks,

This is a personal response not checked with the rest of the team, and I
haven't checked any changes into the sources yet:-)


I'd wondered about fontenc originally but didn't as fontenc only works
on these engines and if you are doing

\ifxetex
\usepackage{fontenc}
\setmainfont...
\else
...
\fi

then you could load inputenc package in the \else branch and the problem
this change was trying to help of inputenc not working with xetex/luatex
would not be an issue. But perhaps you are right and mentioning fontspec
would help.

on ascii I think it hadn't occurred to me that anyone was using inputenc
with ascii:) unlike on pdftex it's not feasible to make all the
non-ascii chars invalid but on the grounds of not breaking existing
documents I think you are right that we should let ascii x-ascii and
utf8x through.

It would be possible to do as you suggest above and disable the control
characters below 32, but I think it's more important on unicode engines
to move away from using active characters for encoding support so I'd
rather just treat all these as aliases for the native engine utf8 support.

Which would mean something like

\else
%    \end{macrocode}
% Unicode based TeX engines do not require this package at all, and 
always use UTF-8
% input. Allow the package to be used if |[uf8]| or |[ascii]| options 
are used to simplify
% switching between TeX engines.
%    \begin{macrocode}
\def\inputencoding#1{%
\edef\reserved@a{#1}%
\@tempswafalse
\@tfor\reserved@b:={utf8}{utf8x}{ascii}{x-ascii}\do{%
\ifx\reserved@a\reserved@b\@tempswatrue\fi}
\if@tempswa
\PackageWarningNoLine
     {inputenc}%
     {inputenc not required for xetex or luatex.\MessageBreak
      utf8 assumed.\MessageBreak
     The fontspec package may be required to\MessageBreak
     access suitable fonts}%
\else
\PackageError
     {inputenc}%
     {inputenc not required for xetex or luatex.\MessageBreak
      only UTF-8 supported}%
     {For xelatex or lualatex do not load inputenc or use [utf8] option.}
\fi}
\fi
%    \end{macrocode}
% \end{macro}


which makes

! Package inputenc Error: inputenc not required for xetex or luatex.
(inputenc)                only UTF-8 supported.

See the inputenc package documentation for explanation.
Type  H <return>  for immediate help.
  ...

l.156 \endinput

? h
For xelatex or lualatex do not load inputenc or use [utf8] option.
?

for an unsupported encoding or for utf8(x) or (x-)ascii it does

Package inputenc Warning: inputenc not required for xetex or luatex.
(inputenc)                utf8 assumed.
(inputenc)                The fontspec package may be required to
(inputenc)                access suitable fonts.



It might be argued that the message should be different in the utf8 and
ascii cases but I think the real message is that there is no macro-layer
encoding switch happening on these engines and the purpose of the change
was to flag a top level message or error rather than just have the
internal catcode changing loops generating spurious error generating
code as it wasn't written for a multibyte base encoding.
So as far as possible I think the xetex behaviour of inputenc should be
give a message or error and do nothing, rather than really "support"
lots of different named encodings as it does with pdflatex.

David






________________________________________________________________________
The Numerical Algorithms Group Ltd is a company registered in England
and Wales with company number 1249803. The registered office is:
Wilkinson House, Jordan Hill Road, Oxford OX2 8DR, United Kingdom.

This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. 
________________________________________________________________________

ATOM RSS1 RSS2