At 09:28 +0000 2001/02/13, Robin Fairbairns wrote:
>none of these groups has yet "finished the job". they have some
>interesting results, but probably could not parse the awful english
>that i write.
>can we practically hypothecate a markup language that depends on
>comprehending natural language?
>surely not, even in the hans aberg universe?
I have no immediate plans of writing a natural language parser, if that is
what you mean.
Otherwise, I stated the general principle, the better the parsing becomes,
the less markup will be needed (or the more sophisticated it can be).
As for that natural language parsing problem, one problem is that humans,
using their massively parallel supercomputers, can scan a sentence and try
many different patterns. Let's try parsing the Frank Mittelbach example:
The a in the formula is a variable.
You would probably use the context knowledge that it is composed of English
and Math and scan it to recognize that the second "a", but not the first,
is a indefinite article. Then from that, you would infer that the first "a"
must be a math symbol, which is supported by the semantic information of
the wording "in the formula".
But in a computer, as soon as one leaves the non-deterministic parsing, the
parsing becomes slow. -- There is a non-deterministic version of Bison, and
that is reported to be slow on the GCC list, I am told. Also, extracting
semantic information from natural language is difficult for similar
reasons, that is, one has to try a number of different patterns.
So it means that for authoring languages, one will probably be stuck with
deterministic parsing techniques, and corresponding markup. But it is still
so that better grammars will decrease the needed markup, and that will also
help the authors using such a language.