On 10/16/11, Joseph Wright <[log in to unmask]> wrote: > On 10/10/2011 16:07, Bruno Le Floch wrote: >> The l3str module provides functions to get the length of a string, >> extract substrings or individual characters, testing for string >> equality (the curent \str_if_eq:nnTF). Some support for encodings is >> provided: percent encoding, conversion from utf-8 to a string of >> bytes, and most functions of Heiko Oberdiek's pdfescape package. >>hly welcome. > > Some comments having read the code and documentation. Thank you Joseph for the cleanup. > I don't like the name in \str_from_to:nnn - it sounds like a copy > function. What's wrong with \str_substr:nnn or just \str_sub:nnn? I couldn't think of an unambiguous name. \str_substr:nnn is fine. > In the same function, the indexing is described as "\meta{start index} > (inclusive) and \meta{end index} (exclusive)". This seems very odd to me > - I'd expect > > \str_from_to:nnn { abcdef } { 1 } { 4 } > > to leave "bcde" in the input stream. I followed the python convention, in which you think of the index as lying between pairs of characters: (0)a(1)b(2)c(3)d(4)e(5)f(6) Hence, extracting from 1 to 4 gives "bcd". The advantage of doing it that way is that the length of what you get is \(4 - 1\). Another advantage is that getting the first <n> characters is easy: \str_substr:nnn { <string> } { 0 } { <n> }. A drawback is that getting all characters from a given point to the end is \str_substr:nnn { <string> } { <n> } { \c_max_int } rather than \str_substr:nnn { <string> } { <n> } { -1 }. Does that make sense? > What's the reasoning for "\str_if_contains_char:NN" rather than just > "\str_if_in:NN"? The second N argument is not enough to know whether you expect a char or a string variable. Should I code an expandable \str_if_in:nn? > I see you have a number of "UTF_viii" functions. I can see that you are > covering any confusion with UTF-16, but would simply "UTF" be better? No, although I do agree that "UTF_viii" is long :(. We will need utf-16 to deal with PDF, as Heiko pointed out in a previous email. Perhaps we should drop support for utf-8 and instead only support utf-16? > I also saw that the docs mentioned "\str_if_UTF_viii:N", which does not > exist. I've removed it, as I think the docs and the code should match as > much as possible. Yes. I never got to implementing it :). Should we lower-case "utf" in function names? -- Bruno