> Afaik that is exactly the problem, that there is no explanation > (=documenation/specification) on the XeTeX side. In most cases > the contents of \special expects bytes, the conversion of bytes > with 8 bits to UTF-8 destroys the contents. Right, so I'll have to look at the precise usage you make of this in your packages. >> Putting non-basic encodings in .def files is probably the best >> approach indeed. The time it takes for me to code all that is probably >> the only disadvantage, since that means postponing the floating point >> module. > > I wouldn't do it manually. There are mappings files for Unicode: > http://unicode.org/Public/MAPPINGS/ > In project I am using these mappings together with a perl script > to generate the .def files. Thank you for the link. It seems that the simplest would be to directly use the tables provided there as the .def files. Simply \catcode`\#=14, and set a few other default catcodes, then input the file, looping over the lines. Are all of the lines of the form 0xHH 0xHHHH # comment (or comment lines), with H = some hexadecimal digit? In other words, are all those encodings 8-bit only, and with only Unicode points <65536? > Of course not, size is 2N. > * All safe characters could be used, then the size decreases > (e.g. ASCII85, ...). But the problem is to find safe characters. > In especially this set might change. > * Some kind of compression could be applied. Right. I was mostly thinking of speed, with my comments on \lowercase. Space is not an issue internally to a TeX run (unless you start manipulating really massive strings). It can be a problem when writing to the PDF file. > It could be made even expandable in linear time > with a large lookup table (256). Right. I was thinking in terms of UTF-8 for some reason, and the lookup table would be too big. > And there is engine support (\pdfunescapehex). Good. >> A safe format where >> more characters are as is seems possibly faster? Also, this doesn't >> allow storage of Unicode data (unless we use an UTF, but the overhead >> of decoding the UTF may be large). Do you think we could devise a more >> efficient method? > > I think that depends on the Unicode support of the engine. Right. I need to do give some serious thoughts to optimization in all those encoding translations and string storages. Give me a few weeks to have a good idea. > Very short (e.g. label/anchor names, ...) up to very huge (e.g. object > stream data of images, ...). That's tough, then :). Regards, Bruno