LISTSERV mailing list manager LISTSERV 16.0

Help for LATEX-L Archives


LATEX-L Archives

LATEX-L Archives


LATEX-L@LISTSERV.UNI-HEIDELBERG.DE


View:

Message:

[

First

|

Previous

|

Next

|

Last

]

By Topic:

[

First

|

Previous

|

Next

|

Last

]

By Author:

[

First

|

Previous

|

Next

|

Last

]

Font:

Proportional Font

LISTSERV Archives

LISTSERV Archives

LATEX-L Home

LATEX-L Home

LATEX-L  October 2011

LATEX-L October 2011

Subject:

Re: Strings, and regular expressions

From:

Heiko Oberdiek <[log in to unmask]>

Reply-To:

Mailing list for the LaTeX3 project <[log in to unmask]>

Date:

Wed, 12 Oct 2011 11:39:58 +0200

Content-Type:

text/plain

Parts/Attachments:

Parts/Attachments

text/plain (50 lines)

On Tue, Oct 11, 2011 at 11:07:13PM -0400, Bruno Le Floch wrote:

> > hyperref already reencodes bookmark strings with setting
> > pdfencoding=auto. The bookmark string is construced in
> > Unicode encoding. Then the reencoding to PDFDocEncoding is tried.
> > If successful the result string is used, otherwise the Unicode string.
> > For the reencoding stuff package stringenc is used and don't need
> > to be expandable for hyperref.
> 
> Thank you Heiko. The stringenc package provides _many_ different
> encodings. Can you point me to which are useful for pdf purposes? 

Most important for PDF strings:

* PDFDocEncoding
* UTF-16

(hyperref also uses "ascii-print" in case of XeTeX because of
encoding problems with \special.)

> I guess that most "iso-..." and "cp..." encodings are an overkill for
> a kernel.

They should be loadable as files similar to LaTeX's .def files
for inputenc or fontenc. Then the kernel can provide a base set
and others can be provided by other projects. But I don't see
the disadvantage if such a base set is not minimal.

Then, when strings are written to PS/PDF, they need further
escaping:
* String escaping, provided by \pdfescapestring.
* Name escaping, provided by \pdfescapename.
* Hex strings, provided by \pdfescapehex.
The latter is also useful for other contexts, e.g. for protecting
arbitrary string data in auxiliary files. As hex string special
characters like '{', '}', '\', '#', ... do not harm.
  These pdfTeX features are provided for LuaTeX in package `pdftexcmds'
and package `pdfescape' provides the features for other engines.

> Also, when you say "Unicode encoding", I presume that this means
> native strings for XeTeX and LuaTeX, but what about pdfTeX? Do you use
> "UTF-16" (if so, LE or BE?), or some other UTF?

In the context of bookmarks and other PDF strings "Unicode"
means UTF-16 (hyperref uses BE, but there is a byte order mark).
And the strings are a sequence of bytes. The big chars of XeTeX or
LuaTeX don't help, because they get written as UTF-8.

Yours sincerely
  Heiko Oberdiek

Top of Message | Previous Page | Permalink

Advanced Options


Options

Log In

Log In

Get Password

Get Password


Search Archives

Search Archives


Subscribe or Unsubscribe

Subscribe or Unsubscribe




LISTSERV.UNI-HEIDELBERG.DE

Universität Heidelberg | Impressum | Datenschutzerklärung

CataList Email List Search Powered by the LISTSERV Email List Manager