## LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

 Options: Use Forum View Use Monospaced Font Show Text Part by Default Show All Mail Headers Message: [<< First] [< Prev] [Next >] [Last >>] Topic: [<< First] [< Prev] [Next >] [Last >>] Author: [<< First] [< Prev] [Next >] [Last >>]

On 30/06/2014 14:05, Joel C. Salomon wrote:
> There’s an important use-case that seems not to have been addressed,
> but perhaps this is better handled in a different layer:
> mixed-language strings.
>
> For example, consider a document with the title, “The Interesting Life
> of Ragıp Hulûsi Özdem” (to chose the first Turkish name I could find
> with both dotted i and dotless ı). Somehow, within the \title{}
> declaration, the change of language must be indicated so that (e.g.)
> at the top of the page this will be transformed to “THE INTERESTING
> LIFE OF RAGIP HULÛSİ ÖZDEM” and not “THE INTERESTİNG LİFE …” nor “…
> LIFE OF RAGIP HULÛSI …”.
>
> A similar situation arises in German where within geographical names
> ‘ß’ should capitalize to the recently-defined ‘ẞ’, not ‘SS’.
> (According to <http://en.wikipedia.org/wiki/Capital_ẞ>, this rule was
>
> As I said, this is probably best handled in a separate layer:  Code
> that capitalizes user-provided text would need to defer to the LaTeX3
> equivalent of Babel, which would scan the text for user-level
> language-change commands, and (among other things) call
> \tl_upper_case:nn with the appropriate language argument.  But I think
> it’s important that the interface to the casing functions being
> defined now be aware of the way they will likely be used.

As you say, this looks much more like a 'high level' requirement: it's
tricky to see how nesting can work and at the same time not be tied to
design otherwise. For example, we can mark up a language in the input
easily enough

The Interesting Life of \SomeLangCommand{tr}{Ragıp Hulûsi Özdem}

but the problem is then making sure that the command does case changing
at point of use. One might imagine that such a command might have a
flexible definition:

\TitleCase#1 =>
\cs_set_eq:NN \SomeLangCommand \text_title_case:nn
... % Other similar stuff
\text_title_case:Vn \l_language_current_tl {#1}

That might still leave a question about x- versus f-type expansion: if
the outcome is meant to be 'just text' then you need to expand
\SomeLangCommand, which at the moment is deliberately avoided. Of
course, such an issue might be avoided by doing a pre-parse, as you suggest:

\TitleCase#1 =>
\cs_set_eq:NN \SomeLangCommand \text_title_case_and_brace:nn
... % Other similar stuff
\tl_set:Nx \l_some_tmpa_tl {#1} %
% Now "The Interesting Life of {RAGIP HULÛSİ ÖZDEM}"
\tl_set:Nx \l_some_tmpa_tl
{ \text_title_case:VV \l_language_current_tl \l_some_tmpa_tl }

'capital Eszett' business for the simple case. I'm not quite sure how
one is meant to do that (not de-DE or whatever, but de-<something>!).
I'm also not sure whether people who do use it will use it for all
Eszetts (might otherwise lead to some odd decisions in upper casing). As
you say, this could of course occur in input where such a decision
applies only to some cases.

What is worth noting is that while the commands I've added take general
text as input, we are seeing them as building blocks for e.g. a
hypothetical \text_title_case:nn. That and related operations need to
do things like worry about 'words', and in that context splitting up the
input is needed anyway. (Will has an expandable approach to do that. We
might imagine seeking to add to that some form of 'recursion' for nested
languages.)

BTW, nice Turkish name: that one is going into the test suite for this area!
--
Joseph Wright