Hi Bill

An update:

I believe the answer to the first is "yes".  For one thing, I think the LaTeX being submitted to arXiv (arxiv.org) is improving.  I commonly look at arXiv source and try to run it through tex4ht toward html+mathml with mathjax.  I see success there as a sign of well-structured LaTeX.  But I've not made a systematic investigation on this point.

But Bruce and Gabor have.  They have worked on the whole arXiv (to produce, in particular, very information-rich MathML) as part of the semantic maths project.  So they can probably give you lots of useful data on this and on many other subjects of interest to LATEX-L discussions.

Best,
chris