LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Condense Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Mailing list for the LaTeX3 project <[log in to unmask]>
Wed, 10 Feb 2010 15:27:30 -0500
Mailing list for the LaTeX3 project <[log in to unmask]>
<[log in to unmask]> (Joseph Wright's message of "Wed, 10 Feb 2010 18:56:23 +0000")
text/plain; charset=us-ascii
William F Hammond <[log in to unmask]>
text/plain (38 lines)
Joseph Wright <[log in to unmask]> writes:

> . . .
> I was thinking of input encodings, where my point was (supposed to) be
> that something like the inputenc "utf8" approach would be an approach
> I hope we can avoid as there are better solutions (in the form of
> engines which deal with the issue). (Of course, that leaves UTF-16
> issues, but I'd hope that engine developments can help out).
> (I'd point out that LaTeX3 code is intended for use in new documents,
> and the rest of the computer world is standardising on UTF-8 as far as
> I can see. So I'd hope very much that having an approach based on this
> concept is not too risky.)

At some point I expect that LaTeX will want to provide for arbitrary
unicode "word" characters in command names.  If that is the case, then
shouldn't standard handling of text-encodings for whole document
instances apply?  In particular, I think that text-encoding
pre-processing (by something like GNU "recode") to meet the needs of
the particular TeX engine would be the way to proceed.

Presumably the UTF-8 and UTF-16 text encodings for unicode are both
supported everywhere in the XML world.  While, UTF-8 is more efficient
for Western languages, UTF-16 is likely to be favored in regions where
the unicode sections devoted to local character sets take more than 4
bytes per character.

I also imagine that eventually many documents will have origin under
author-level XML document types, and so what is fed to a TeX engine
would then be output from a pipeline.  In that case I suppose UTF-8
would be a reasonable standard.  Let me also point out that, to the
extent that XML origination is realized, TeX engines might never need
to bite the bullet on non-ascii command names.  That is, LaTeX
provision of non-ascii command names could be handled via XML front
ends that are sponsored by the LaTeX Project.

                                    -- Bill