On Wed, Feb 10, 2010 at 11:30:30AM +0000, Joseph Wright wrote:
> On 10/02/2010 10:09, Heiko Oberdiek wrote:
> >* Encoding conversions, see package `stringenc'.
> > Application: PDF (outlines and other text fields).
> At present we seem to have stayed away from encodings. My own
> preference is to leave things to LaTeX2e when working as a package
> and to use the "native" encoding only for the format (with UTF-8
> engines available this seems sensible to me).
There are encoding issues independet from TeX input.
For example, outlines can be given as strings in PDFDocEncoding
or Unicode (UTF-16). In hyperref, option pdfencoding=auto,
the string is first encoded as Unicode, then PDFDocEncoding is tried
and used, if it fits.
Also `encoding' can be understood in a general way including
hex strings, strings in ASCII85, quoted strings in C/Lua/..., ...
> >* Matching (replacing) using regular expressions,
> > see \pdfmatch and luaTeX.
> > Matching is useful for extracting information pieces or
> > validating option values, ...
> > Unhappily \pdfmatch has still the status "experimental"
> > and the regular expression language differs from Lua's.
> I think we'll be staying away from this. XeTeX has no equivalent of
> \pdfmatch, and as you say the LuaTeX version works differently from
> the pdfTeX one.
The authors/maintainers of XeTeX/pdfTeX/LuaTeX could agree on
one version, supported by all these engines.
> [At present, we only *require* e-TeX in any case,
> although an engine with \(pdf)strcmp available is very useful.]
If an expandable version is not an issue, then \pdfstrcmp
can be implemented in virgin TeX. If the input of \pdfstrcmp
consists of `other' and perhaps `space' tokens, then
even an expandable version can be simulated. The main problem
is AFAIK the conversion of a general token list to the string
representation in an expandable way (that means without
Heiko <[log in to unmask]>