On Wed, Feb 10, 2010 at 11:30:30AM +0000, Joseph Wright wrote: > On 10/02/2010 10:09, Heiko Oberdiek wrote: > >* Encoding conversions, see package `stringenc'. > > Application: PDF (outlines and other text fields). > > At present we seem to have stayed away from encodings. My own > preference is to leave things to LaTeX2e when working as a package > and to use the "native" encoding only for the format (with UTF-8 > engines available this seems sensible to me). There are encoding issues independet from TeX input. For example, outlines can be given as strings in PDFDocEncoding or Unicode (UTF-16). In hyperref, option pdfencoding=auto, the string is first encoded as Unicode, then PDFDocEncoding is tried and used, if it fits. Also `encoding' can be understood in a general way including hex strings, strings in ASCII85, quoted strings in C/Lua/..., ... > >* Matching (replacing) using regular expressions, > > see \pdfmatch and luaTeX. > > Matching is useful for extracting information pieces or > > validating option values, ... > > Unhappily \pdfmatch has still the status "experimental" > > and the regular expression language differs from Lua's. > > I think we'll be staying away from this. XeTeX has no equivalent of > \pdfmatch, and as you say the LuaTeX version works differently from > the pdfTeX one. The authors/maintainers of XeTeX/pdfTeX/LuaTeX could agree on one version, supported by all these engines. > [At present, we only *require* e-TeX in any case, > although an engine with \(pdf)strcmp available is very useful.] If an expandable version is not an issue, then \pdfstrcmp can be implemented in virgin TeX. If the input of \pdfstrcmp consists of `other' and perhaps `space' tokens, then even an expandable version can be simulated. The main problem is AFAIK the conversion of a general token list to the string representation in an expandable way (that means without having \edef). Yours sincerely Heiko <[log in to unmask]>