LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Subject:	Re: String module
From:	Joseph Wright <[log in to unmask]>
Reply To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Wed, 10 Feb 2010 11:30:30 +0000
Content-Type:	text/plain
Parts/Attachments:	text/plain (64 lines)

On 10/02/2010 10:09, Heiko Oberdiek wrote:
>> The first "big" question is what exactly is a string in a TeX
>> context.
>
> Looking at the result of \string, \meaning, \detokenize, \jobname, ...
> it's quite clear:
>    A string is a sequence of catcode 12 tokens with the exception
>    that the space (charcode 32) has catcode 10.

I was intending to mean not as far as the engine is concerned (where of 
course you are correct) but more for programmers (taking account of the 
point that the existing "strings" packages don't seem to take the TeX 
approach to what constitutes a string).

> The latter is quite unhappy, because it makes string processing
> unnecessary troublesome. For example, a space cannot be catched
> as undelimited parameter.
>
> ==>  Question: Catcode of space (10 or 12)?

This occurred to me, too. I was leaning toward "everything catcode 12" 
for that reason.

>> You also have to worry about what happens about special characters
>> (for example, how do you get % into a string). If you escape things
>> at the input stage [say \% =>  % (catcode 12)] then a simple
>> \detokenize will not work.
>
> I think that's beyond a string module. At TeX input level, you can only
> input tokens under the authority of the current catcode settings.
> The input of the string module would rather be token lists, that
> get converted to strings, basically catcode 12 tokens. And the output
> would also be strings.

Again, I was thinking about what is already "out there". You'll see that 
stringstrings, for example, takes a rather detailed approach to this 
type of problem. Whether that is right I'm not sure!

> * For dealing with PDF or PostScript it is useful to have conversions
>    from and to PDF/PS names, PDF/PS strings and hex strings.
>    See pdfTeX primitives \pdfescapehex et. al.
>    and packages `pdfescape' and `pdftexcmds'.
> * Encoding conversions, see package `stringenc'.
>    Application: PDF (outlines and other text fields).

At present we seem to have stayed away from encodings. My own preference 
is to leave things to LaTeX2e when working as a package and to use the 
"native" encoding only for the format (with UTF-8 engines available this 
seems sensible to me).

> * Matching (replacing) using regular expressions,
>    see \pdfmatch and luaTeX.
>    Matching is useful for extracting information pieces or
>    validating option values, ...
>    Unhappily \pdfmatch has still the status "experimental"
>    and the regular expression language differs from Lua's.

I think we'll be staying away from this. XeTeX has no equivalent of 
\pdfmatch, and as you say the LuaTeX version works differently from the 
pdfTeX one. [At present, we only *require* e-TeX in any case, although 
an engine with \(pdf)strcmp available is very useful.]
-- 
Joseph Wright

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung