On Tue, Oct 11, 2011 at 11:07:13PM -0400, Bruno Le Floch wrote: > > hyperref already reencodes bookmark strings with setting > > pdfencoding=auto. The bookmark string is construced in > > Unicode encoding. Then the reencoding to PDFDocEncoding is tried. > > If successful the result string is used, otherwise the Unicode string. > > For the reencoding stuff package stringenc is used and don't need > > to be expandable for hyperref. > > Thank you Heiko. The stringenc package provides _many_ different > encodings. Can you point me to which are useful for pdf purposes? Most important for PDF strings: * PDFDocEncoding * UTF-16 (hyperref also uses "ascii-print" in case of XeTeX because of encoding problems with \special.) > I guess that most "iso-..." and "cp..." encodings are an overkill for > a kernel. They should be loadable as files similar to LaTeX's .def files for inputenc or fontenc. Then the kernel can provide a base set and others can be provided by other projects. But I don't see the disadvantage if such a base set is not minimal. Then, when strings are written to PS/PDF, they need further escaping: * String escaping, provided by \pdfescapestring. * Name escaping, provided by \pdfescapename. * Hex strings, provided by \pdfescapehex. The latter is also useful for other contexts, e.g. for protecting arbitrary string data in auxiliary files. As hex string special characters like '{', '}', '\', '#', ... do not harm. These pdfTeX features are provided for LuaTeX in package `pdftexcmds' and package `pdfescape' provides the features for other engines. > Also, when you say "Unicode encoding", I presume that this means > native strings for XeTeX and LuaTeX, but what about pdfTeX? Do you use > "UTF-16" (if so, LE or BE?), or some other UTF? In the context of bookmarks and other PDF strings "Unicode" means UTF-16 (hyperref uses BE, but there is a byte order mark). And the strings are a sequence of bytes. The big chars of XeTeX or LuaTeX don't help, because they get written as UTF-8. Yours sincerely Heiko Oberdiek