Sender: |
|
Date: |
Sun, 16 Oct 2011 21:36:47 -0400 |
Reply-To: |
|
Subject: |
|
MIME-Version: |
1.0 |
Message-ID: |
|
In-Reply-To: |
|
Content-Type: |
text/plain; charset=ISO-8859-1 |
From: |
|
Parts/Attachments: |
|
|
On 10/16/11, Joseph Wright <[log in to unmask]> wrote:
> On 10/10/2011 16:07, Bruno Le Floch wrote:
>> The l3str module provides functions to get the length of a string,
>> extract substrings or individual characters, testing for string
>> equality (the curent \str_if_eq:nnTF). Some support for encodings is
>> provided: percent encoding, conversion from utf-8 to a string of
>> bytes, and most functions of Heiko Oberdiek's pdfescape package.
>>hly welcome.
>
> Some comments having read the code and documentation.
Thank you Joseph for the cleanup.
> I don't like the name in \str_from_to:nnn - it sounds like a copy
> function. What's wrong with \str_substr:nnn or just \str_sub:nnn?
I couldn't think of an unambiguous name. \str_substr:nnn is fine.
> In the same function, the indexing is described as "\meta{start index}
> (inclusive) and \meta{end index} (exclusive)". This seems very odd to me
> - I'd expect
>
> \str_from_to:nnn { abcdef } { 1 } { 4 }
>
> to leave "bcde" in the input stream.
I followed the python convention, in which you think of the index as
lying between pairs of characters:
(0)a(1)b(2)c(3)d(4)e(5)f(6)
Hence, extracting from 1 to 4 gives "bcd". The advantage of doing it
that way is that the length of what you get is \(4 - 1\). Another
advantage is that getting the first <n> characters is easy:
\str_substr:nnn { <string> } { 0 } { <n> }. A drawback is that getting
all characters from a given point to the end is \str_substr:nnn {
<string> } { <n> } { \c_max_int } rather than \str_substr:nnn {
<string> } { <n> } { -1 }. Does that make sense?
> What's the reasoning for "\str_if_contains_char:NN" rather than just
> "\str_if_in:NN"?
The second N argument is not enough to know whether you expect a char
or a string variable.
Should I code an expandable \str_if_in:nn?
> I see you have a number of "UTF_viii" functions. I can see that you are
> covering any confusion with UTF-16, but would simply "UTF" be better?
No, although I do agree that "UTF_viii" is long :(. We will need
utf-16 to deal with PDF, as Heiko pointed out in a previous email.
Perhaps we should drop support for utf-8 and instead only support
utf-16?
> I also saw that the docs mentioned "\str_if_UTF_viii:N", which does not
> exist. I've removed it, as I think the docs and the code should match as
> much as possible.
Yes. I never got to implementing it :).
Should we lower-case "utf" in function names?
--
Bruno
|
|
|