Hello all, To support case-changing operations in expl3, the team some time ago added an experimental pair \tl_expandable_uppercase:n/\tl_expandable_lowercase:n as alternatives to \tl_to_uppercase:n/\tl_to_lowercase:n. While the expandable operations are useful, there are issues both in terms of naming (solvable) and functionality (more complex). In particular, they cover only the ASCII range and do not offer some of the context-sensitive case changing that is required for languages other than English. In order to address this, we have now added a new set of experimental functions to l3candidates: - \tl_upper_case:n(n) - \tl_lower_case:n(n) - \tl_mixed_case:n(n) These are x-type expandable, so can be used inside for example \tl_set:Nx, and when used with XeTeX or LuaTeX offer full UTF-8 character coverage. What we are hoping for is some feedback on the interfaces, naming, etc.: we believe that the ideas are useful, and hope in the longer term to use these to replace \tl_expandable_(upper|lower)case:n and \tl_to_(upper|lower)case:n for case changing. (The latter will still be required for generating non-standard catcodes: we will provide a better interface for that process at a later date.) (We note that while expandability is not absolutely required in this area, there are advantages to being able to simple set a tl to the case-changed version of text. We therefore feel that expandability is desirable here and the approach we have taken to some technical issues reflects this.) The versions with one argument do a relatively simple language-insensitive mapping: \tl_lower_case:n { HELLO } => "hello" \tl_upper_case:n { hello } => "HELLO" \tl_mixed_case:n { HELLO } => "Hello" while the two-argument versions can do language-dependent changes, such as dotted/dotless-i/I handling in Turkish: \tl_upper_case:nn { tr } { i } => "İ" The 'mixed' case variant is the low-level command needed to implement 'sentence' or 'title' case (the Unicode Consortium refer to both the low-level and higher-level mapping as title casing): here, there is no attempt to pick up on 'words' in a 'sentence'. (Once discussion on these lower level functions is complete, we will look to see how best to provide higher-level code for title/sentence casing: these operations clearly apply to 'text' not 'token lists'.) Some of what is required here is clear from the Unicode docs. Implementing some of the requirements in TeX, particularly in an expandable form, requires some modification of the described algorithms. Thus areas where feedback is particularly welcome include: - Brace groups/escaping: the current version takes an approach similar to BibTeX, treating all brace groups as 'preserved'. This is a clear rule but leaves open questions on how (if at all) to handle commands in 'text'. Notably, these functions are intended for 'text like' input, so this may not be an issue. Notice that math mode is given no special treatment but can be protected from case changing by bracing. - Category code treatment: should case operations apply to chars on a string-like basis (current approach) or only to 'letters'. Again, as these functions seem to target 'text', category codes may not be as important here as in some other context. - Chars to skip at the start of 'text' when doing 'mixed' casing (what counts as the first 'letter'). - The 'final sigma' rule in Greek: trying to handle all cases here is challenging in an expandable TeX system, and so we have implemented a more limited approach which counts a sigma as 'final' if followed by a small set of chars (currently a space or one of "!'),.:;?]}") - The 'dot above' rule in Lithuanian: we have again implemented this using a more restricted approach than the Unicode docs described, focussing only on chars/accents which are (we understand) used in Lithuanian - Whether 'mixed' case is a clear description of the idea of (informally) upper casing the first letter in 'text' and then lower casing the remainder. The code has not gone to CTAN yet but is available on the GitHub mirror. See in particular https://github.com/latex3/svn-mirror/blob/master/l3kernel/l3candidates.dtx and https://github.com/latex3/svn-mirror/blob/master/l3kernel/l3unicode-data.def: the latter is needed as it contains the data used for the transformations. Feedback on all of this is very welcome: we hope to provide a high-quality interface for case changing such that it can readily be applied to a range of situations. -- Joseph Wright