LATEX-L Archives

Mailing list for the LaTeX3 project


Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Lars Hellström <[log in to unmask]>
Reply To:
Mailing list for the LaTeX3 project <[log in to unmask]>
Mon, 6 Mar 2006 08:49:32 +0100
text/plain (55 lines)
Söndagen den 5 mars 2006 kl 21.30 skrev Frank Mittelbach:
> that is not to say that  the line
>>>>  \DeclareUnicodeCharacter{02C6}{\textasciicircum}
> is probably wrong it should be most likely
>   \DeclareUnicodeCharacter{005E}{\textasciicircum}
> and several others have similar defects.  would be good if that got 
> checked.

Is that even a legal definition? U+005E (^) is, as was mentioned 
earlier in this thread, syntax in LaTeX, so you can't inputenc map it 
to something. Or are you thinking about some attempt at supporting 
verbatim input?

>> Example: Assuming there is a word "deaffish" and the
>> author does not want a ligature ffi spanning both word parts.
>> Therefore, having a good editor, he uses the Unicode sequence
>> U+0066 U+FB01 to specify the correct and desired ligature.
>> Using the later case of \DeclareUnicodeCharacter{FB01}
>> TeX would get "ffi" and then form the wrong ligature.
> wrong example in my opinion. as Lars said: fi or ffi ligature ended up 
> in
> unicode as legacy codes because they were in legacy 8-bit encodings. 
> million
> other ligatures are not available as "chars" because UC like most other
> standards are heavily influenced by what is right for certain 
> countries but
> not others. using "fi" in this way is like using tables in html to 
> position
> elements on the page, ie it works for that example but ...
> so the right thing is not to use fi at all here but would be to a 
> generic
> method to denote subword boundaries or whatever to allow the formatter 
> not to
> use the ligature. TeX's method would be \textcompwordmark ... but 
> unicode
> never thought that such encoding of lgoical information is the task of 
> the
> standard.

Actually, U+200C (ZERO WIDTH NON-JOINER) seems to me a perfect match to 
\textcompwordmark, and I've entered it as such in my "Draft 
specification for the T1 encoding".

More pragmatically, one may of course write "deaf\-fish" to not only 
escape the ligature, but also point out the proper point of hyphenation.

Lars Hellström