LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show HTML Part by Default Condense Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Sender:	Mailing list for the LaTeX3 project <[log in to unmask]>
Date:	Sun, 16 Oct 2011 21:36:47 -0400
Reply-To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:	Re: Strings, and regular expressions
MIME-Version:	1.0
Message-ID:	<[log in to unmask]>
In-Reply-To:	<[log in to unmask]>
Content-Type:	text/plain; charset=ISO-8859-1
From:	Bruno Le Floch <[log in to unmask]>
Parts/Attachments:	text/plain (67 lines)

On 10/16/11, Joseph Wright <[log in to unmask]> wrote:
> On 10/10/2011 16:07, Bruno Le Floch wrote:
>> The l3str module provides functions to get the length of a string,
>> extract substrings or individual characters, testing for string
>> equality (the curent \str_if_eq:nnTF). Some support for encodings is
>> provided: percent encoding, conversion from utf-8 to a string of
>> bytes, and most functions of Heiko Oberdiek's pdfescape package.
>>hly welcome.
>
> Some comments having read the code and documentation.

Thank you Joseph for the cleanup.

> I don't like the name in \str_from_to:nnn - it sounds like a copy
> function. What's wrong with \str_substr:nnn or just \str_sub:nnn?

I couldn't think of an unambiguous name. \str_substr:nnn is fine.

> In the same function, the indexing is described as "\meta{start index}
> (inclusive) and \meta{end index} (exclusive)". This seems very odd to me
> - I'd expect
>
>   \str_from_to:nnn { abcdef } { 1 } { 4 }
>
> to leave "bcde" in the input stream.

I followed the python convention, in which you think of the index as
lying between pairs of characters:

(0)a(1)b(2)c(3)d(4)e(5)f(6)

Hence, extracting from 1 to 4 gives "bcd". The advantage of doing it
that way is that the length of what you get is \(4 - 1\). Another
advantage is that getting the first <n> characters is easy:
\str_substr:nnn { <string> } { 0 } { <n> }. A drawback is that getting
all characters from a given point to the end is \str_substr:nnn {
<string> } { <n> } { \c_max_int } rather than \str_substr:nnn {
<string> } { <n> } { -1 }. Does that make sense?


> What's the reasoning for "\str_if_contains_char:NN" rather than just
> "\str_if_in:NN"?

The second N argument is not enough to know whether you expect a char
or a string variable.

Should I code an expandable \str_if_in:nn?

> I see you have a number of "UTF_viii" functions. I can see that you are
> covering any confusion with UTF-16, but would simply "UTF" be better?

No, although I do agree that "UTF_viii" is long :(. We will need
utf-16 to deal with PDF, as Heiko pointed out in a previous email.
Perhaps we should drop support for utf-8 and instead only support
utf-16?

> I also saw that the docs mentioned "\str_if_UTF_viii:N", which does not
> exist. I've removed it, as I think the docs and the code should match as
> much as possible.

Yes. I never got to implementing it :).

Should we lower-case "utf" in function names?

--
Bruno

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung