On 16/03/20, 17:32, Joseph Wright wrote: > On 16/03/2020 17:01, Kelly Smith wrote: > > Hello! > > > > I’ve been thinking: since Lua is already involved in the build process, > > by way of l3build, wouldn’t it be reasonable to use a lua script > > to preprocess Unicode data into forms that are easily consumed by LaTeX > > during the format-building process? > > > > Warmly, > > Kelly > > > > It depends on the outcome you are after. > > The original loading method for Unicode data in XeTeX was via a Perl script. > That created a .tex file containing (for example) catcode data. To update > the Unicode data, one had to run the Perl script, then send the processed > files to CTAN. There were two issues. First, that meant that any change > required active work to not only get the data from Unicode but also to > manipulate it. Second, and more significant, it was *slower* than just > reading the files in TeX. (This only became apparent when I wrote some test > parsers.) > > Now, there is more data being loaded today than when I did that work, and > some of it is in LuaTeX so could be done Lua-only. It's also possible that > the Perl script was sub-optimal, or that as part of a general 'install' > function the time would not really show. However, XeTeX needs the data, so > one is still looking at having to explicitly pre-process in Lua. Moreover, > most of the time taken for format-building is not about reading Unicode > data. With LuaTeX, pre-loading expl3 does cut out a slight 'stall' when > loading everything for case-changing, but having a LuaTeX and a XeTeX path > separately is not attractive. Is there any distribution that doesn't have LuaTeX in the default installation? (Apart from exotic things like TeX Live infra-only) Then it would be conceivable to just make LuaTeX a hard requirement and process the Unicode data on the fly instead of going via CTAN. > The current set-up means that updating the Unicode files is just a question > of copy-pasting the raw .txt files into a form that CTAN can accept. > Pre-digesting still leaves us needing some way to co-ordinate between > packages (format, luaotfload, expl3, specialist stuff), plus with having to > do the explicit extraction. > > As format-building is all about saving time for 'normal' runs, I'm not > seeing there is a massive need to speed up the process. I know there is one > engine in development that doesn't use format files, so that might be a > place to consider things, but I think we'd need a strong case to alter the > approach for XeTeX/LuaTeX (pdfTeX, ...). Are you referring to JSBox? I doubt that this will every be public. Cheers, Henri > Joseph