On 16/03/2020 17:01, Kelly Smith wrote: > Hello! > > I’ve been thinking: since Lua is already involved in the build process, > by way of l3build, wouldn’t it be reasonable to use a lua script > to preprocess Unicode data into forms that are easily consumed by LaTeX > during the format-building process? > > Warmly, > Kelly > It depends on the outcome you are after. The original loading method for Unicode data in XeTeX was via a Perl script. That created a .tex file containing (for example) catcode data. To update the Unicode data, one had to run the Perl script, then send the processed files to CTAN. There were two issues. First, that meant that any change required active work to not only get the data from Unicode but also to manipulate it. Second, and more significant, it was *slower* than just reading the files in TeX. (This only became apparent when I wrote some test parsers.) Now, there is more data being loaded today than when I did that work, and some of it is in LuaTeX so could be done Lua-only. It's also possible that the Perl script was sub-optimal, or that as part of a general 'install' function the time would not really show. However, XeTeX needs the data, so one is still looking at having to explicitly pre-process in Lua. Moreover, most of the time taken for format-building is not about reading Unicode data. With LuaTeX, pre-loading expl3 does cut out a slight 'stall' when loading everything for case-changing, but having a LuaTeX and a XeTeX path separately is not attractive. The current set-up means that updating the Unicode files is just a question of copy-pasting the raw .txt files into a form that CTAN can accept. Pre-digesting still leaves us needing some way to co-ordinate between packages (format, luaotfload, expl3, specialist stuff), plus with having to do the explicit extraction. As format-building is all about saving time for 'normal' runs, I'm not seeing there is a massive need to speed up the process. I know there is one engine in development that doesn't use format files, so that might be a place to consider things, but I think we'd need a strong case to alter the approach for XeTeX/LuaTeX (pdfTeX, ...). Joseph