On 16/03/2020 22:54, Kelly Smith wrote:
>> As format-building is all about saving time for 'normal' runs, I'm not
>> seeing there is a massive need to speed up the process. I know there is
>> one engine in development that doesn't use format files, so that might
>> be a place to consider things, but I think we'd need a strong case to
>> alter the approach for XeTeX/LuaTeX (pdfTeX, ...).
> Sorry, I should’ve clarified: the point of preprocessing the data wouldn’t
> be to speed up anything, instead, the point would be to do complex
> processing that would be very difficult or even impossible in LaTeX.
> For example, if the l3regex module were extended so that precompiled
> regexes could be used as parts of other regexes, then Unicode properties
> could be simply implemented by referring to precompiled regexes whose
> content was created by running filters over the Unicode character database.
> Another example would be processing the very complex XML files that are
> used in supplementary Unicode files, like the Common Locale Data Repository,
> which could help with localization and language-specific date and number
> This idea of preprocessing could be applied to any complex data set that
> LaTeX3 may need to work with, but I used the example of Unicode data
> because that’s the one that immediately came to mind.
When there is a need to do complex pre-processing, Lua is the obvious
way to go nowadays, and that's reflected in a number of scripts. The
issue tends to be not what one does to script extraction, but rather the
target at the TeX end.
For example, on the regex idea, I suspect performance would be the major
concern: regex processing is already a lot of work, and I have a feeling
Bruno would want to optimise how the data were stored inside TeX. (I'm
not sure where the balance between data extraction and storage lies
here.) Similarly, a lot of work is done by Javier for babel using the
CLDR, but many of the outcomes are not amendable to scripting: it's
about how you set up the TeX 'just right'.
So whilst there is no reason not to use Lua when it works, at the moment
there are not pressing areas where the pre-processing is the barrier.