On 16/03/20, 17:32, Joseph Wright wrote:
> On 16/03/2020 17:01, Kelly Smith wrote:
> > Hello!
> >
> > I’ve been thinking: since Lua is already involved in the build process,
> > by way of l3build, wouldn’t it be reasonable to use a lua script
> > to preprocess Unicode data into forms that are easily consumed by LaTeX
> > during the format-building process?
> >
> > Warmly,
> > Kelly
> >
>
> It depends on the outcome you are after.
>
> The original loading method for Unicode data in XeTeX was via a Perl script.
> That created a .tex file containing (for example) catcode data. To update
> the Unicode data, one had to run the Perl script, then send the processed
> files to CTAN. There were two issues. First, that meant that any change
> required active work to not only get the data from Unicode but also to
> manipulate it. Second, and more significant, it was *slower* than just
> reading the files in TeX. (This only became apparent when I wrote some test
> parsers.)
>
> Now, there is more data being loaded today than when I did that work, and
> some of it is in LuaTeX so could be done Lua-only. It's also possible that
> the Perl script was sub-optimal, or that as part of a general 'install'
> function the time would not really show. However, XeTeX needs the data, so
> one is still looking at having to explicitly pre-process in Lua. Moreover,
> most of the time taken for format-building is not about reading Unicode
> data. With LuaTeX, pre-loading expl3 does cut out a slight 'stall' when
> loading everything for case-changing, but having a LuaTeX and a XeTeX path
> separately is not attractive.
Is there any distribution that doesn't have LuaTeX in the default
installation? (Apart from exotic things like TeX Live infra-only) Then
it would be conceivable to just make LuaTeX a hard requirement and
process the Unicode data on the fly instead of going via CTAN.
> The current set-up means that updating the Unicode files is just a question
> of copy-pasting the raw .txt files into a form that CTAN can accept.
> Pre-digesting still leaves us needing some way to co-ordinate between
> packages (format, luaotfload, expl3, specialist stuff), plus with having to
> do the explicit extraction.
>
> As format-building is all about saving time for 'normal' runs, I'm not
> seeing there is a massive need to speed up the process. I know there is one
> engine in development that doesn't use format files, so that might be a
> place to consider things, but I think we'd need a strong case to alter the
> approach for XeTeX/LuaTeX (pdfTeX, ...).
Are you referring to JSBox? I doubt that this will every be public.
Cheers, Henri
> Joseph
|