> Afaik that is exactly the problem, that there is no explanation
> (=documenation/specification) on the XeTeX side. In most cases
> the contents of \special expects bytes, the conversion of bytes
> with 8 bits to UTF-8 destroys the contents.
Right, so I'll have to look at the precise usage you make of this in
>> Putting non-basic encodings in .def files is probably the best
>> approach indeed. The time it takes for me to code all that is probably
>> the only disadvantage, since that means postponing the floating point
> I wouldn't do it manually. There are mappings files for Unicode:
> In project I am using these mappings together with a perl script
> to generate the .def files.
Thank you for the link. It seems that the simplest would be to
directly use the tables provided there as the .def files. Simply
\catcode`\#=14, and set a few other default catcodes, then input the
file, looping over the lines. Are all of the lines of the form
0xHH 0xHHHH # comment
(or comment lines), with H = some hexadecimal digit? In other words,
are all those encodings 8-bit only, and with only Unicode points
> Of course not, size is 2N.
> * All safe characters could be used, then the size decreases
> (e.g. ASCII85, ...). But the problem is to find safe characters.
> In especially this set might change.
> * Some kind of compression could be applied.
Right. I was mostly thinking of speed, with my comments on \lowercase.
Space is not an issue internally to a TeX run (unless you start
manipulating really massive strings). It can be a problem when writing
to the PDF file.
> It could be made even expandable in linear time
> with a large lookup table (256).
Right. I was thinking in terms of UTF-8 for some reason, and the
lookup table would be too big.
> And there is engine support (\pdfunescapehex).
>> A safe format where
>> more characters are as is seems possibly faster? Also, this doesn't
>> allow storage of Unicode data (unless we use an UTF, but the overhead
>> of decoding the UTF may be large). Do you think we could devise a more
>> efficient method?
> I think that depends on the Unicode support of the engine.
Right. I need to do give some serious thoughts to optimization in all
those encoding translations and string storages. Give me a few weeks
to have a good idea.
> Very short (e.g. label/anchor names, ...) up to very huge (e.g. object
> stream data of images, ...).
That's tough, then :).