## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Classic View Use Monospaced Font Show Text Part by Default Condense Mail Headers Topic: [<< First] [< Prev] [Next >] [Last >>]

 Sender: Mailing list for the LaTeX3 project <[log in to unmask]> Date: Wed, 10 Feb 2010 11:30:30 +0000 Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]> Message-ID: <[log in to unmask]> Subject: Re: String module MIME-Version: 1.0 Content-Transfer-Encoding: 7bit In-Reply-To: <[log in to unmask]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed From: Joseph Wright <[log in to unmask]> Parts/Attachments: text/plain (64 lines) On 10/02/2010 10:09, Heiko Oberdiek wrote: >> The first "big" question is what exactly is a string in a TeX >> context. > > Looking at the result of \string, \meaning, \detokenize, \jobname, ... > it's quite clear: > A string is a sequence of catcode 12 tokens with the exception > that the space (charcode 32) has catcode 10. I was intending to mean not as far as the engine is concerned (where of course you are correct) but more for programmers (taking account of the point that the existing "strings" packages don't seem to take the TeX approach to what constitutes a string). > The latter is quite unhappy, because it makes string processing > unnecessary troublesome. For example, a space cannot be catched > as undelimited parameter. > > ==> Question: Catcode of space (10 or 12)? This occurred to me, too. I was leaning toward "everything catcode 12" for that reason. >> You also have to worry about what happens about special characters >> (for example, how do you get % into a string). If you escape things >> at the input stage [say \% => % (catcode 12)] then a simple >> \detokenize will not work. > > I think that's beyond a string module. At TeX input level, you can only > input tokens under the authority of the current catcode settings. > The input of the string module would rather be token lists, that > get converted to strings, basically catcode 12 tokens. And the output > would also be strings. Again, I was thinking about what is already "out there". You'll see that stringstrings, for example, takes a rather detailed approach to this type of problem. Whether that is right I'm not sure! > * For dealing with PDF or PostScript it is useful to have conversions > from and to PDF/PS names, PDF/PS strings and hex strings. > See pdfTeX primitives \pdfescapehex et. al. > and packages `pdfescape' and `pdftexcmds'. > * Encoding conversions, see package `stringenc'. > Application: PDF (outlines and other text fields). At present we seem to have stayed away from encodings. My own preference is to leave things to LaTeX2e when working as a package and to use the "native" encoding only for the format (with UTF-8 engines available this seems sensible to me). > * Matching (replacing) using regular expressions, > see \pdfmatch and luaTeX. > Matching is useful for extracting information pieces or > validating option values, ... > Unhappily \pdfmatch has still the status "experimental" > and the regular expression language differs from Lua's. I think we'll be staying away from this. XeTeX has no equivalent of \pdfmatch, and as you say the LuaTeX version works differently from the pdfTeX one. [At present, we only *require* e-TeX in any case, although an engine with \(pdf)strcmp available is very useful.] -- Joseph Wright