LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

Options: Use Classic View

Use Proportional Font
Show HTML Part by Default
Condense Mail Headers

Topic: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Sender: Mailing list for the LaTeX3 project <[log in to unmask]>
Date: Thu, 16 Mar 2023 16:51:51 +0100
Reply-To: Mailing list for the LaTeX3 project <[log in to unmask]>
Message-ID: <[log in to unmask]>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
In-Reply-To: <[log in to unmask]>
Content-Type: text/plain; charset=UTF-8; format=flowed
From: Javier Bezos <[log in to unmask]>
Parts/Attachments: text/plain (60 lines)
>> Although intentional, this is not quite correct if we follow the
>> Unicode CLDR, which considers the ‘default’ Norwegian is ‘no’
>> (technically it’s a macrolanguage, and ‘nb’=“Norwegian Bokmål” is
>> empty and just inherits from it). This is the correspondence made
>> by babel, which follows Unicode.
> 
>   Except that that explanation is a bit turned on its head :-)  First,
> the Common Locale Data Repository is not a part of the Unicode Standard
> (which encodes scripts, not languages),

Sure, but Unicode (more precisely Unicode, Inc., aka Unicode
Consortium) is also the name of the organization maintaining
the Unicode Standard and...

> it’s simply hosted on the
> Unicode servers.

... the Unicode Common Language Data Repository. This is its full
name, as often used, and the copyright holder is Unicode. What’s
hosted is the ICU, but we are digressing.

> Second, the fact that [nb] inherits from [no] without
> changes in the CLDR means precisely the opposite of the interpretation
> you make of it: it means that for the CLDR, Norwegian is [nb], that’s
> the relevant piece of information. 

 From the CLDR:

<language type="nb">Norwegian Bokmål</language>
<language type="nn">Norwegian Nynorsk</language>
<language type="no">Norwegian</language>

So, the ‘default’ Norwegian is “no”. Both “nb” and “nn” inherit
from “no”, but “nb” (which represents “nb_NO”) is in addition
the ‘default content locale’ (listed as such in
supplementalMetadata.xml) for “no”. This means there is a
relation no/nb similar to en/en_US. See

https://cldr.unicode.org/translation/translation-guide-general/default-content

A hint – there are two Norwegian Wikipedias, namely
no.wikipedia.org and nn.wikipedia. No nb.wikipedia.org, which is
just a redirection to the ‘real’ thing.

> so that hyph-no.tex contains their intersection, which both
> hyph-nb.tex and hyph-nn.tex load, with a very small number of additions > Hence the code [no] is only used as a convenience and does not mean
> “these are the patterns you get when you request [no]”.  

Using a existing tag assigned to a (meta)language as a convenience
for the intersection of other locales doesn’t seem a good idea.

>   Anyway, the more general point is that the comments in the hyph-*
> files have been very carefully tailored to generate exactly the contents
> of language.dat (and other infrastructure files) that we need; 

This was basically my conclusion. Even if not quite correct, it
works in practice as it should.

Javier

ATOM RSS1 RSS2