LISTSERV - LATEX-L Archives - LISTSERV.UNI-HEIDELBERG.DE

LATEX-L Archives

Mailing list for the LaTeX3 project

LATEX-L@LISTSERV.UNI-HEIDELBERG.DE

	LISTSERV Archives
	LATEX-L Home

	Log In
	Register

	Subscribe or Unsubscribe

	Search Archives

Options:	Use Forum View Use Monospaced Font Show Text Part by Default Condense Mail Headers
Message:	[<< First] [< Prev] [Next >] [Last >>]
Topic:	[<< First] [< Prev] [Next >] [Last >>]
Author:	[<< First] [< Prev] [Next >] [Last >>]

Sender:	Mailing list for the LaTeX3 project <[log in to unmask]>
Subject:	Re: Why markup?
From:	Hans Aberg <[log in to unmask]>
Date:	Tue, 13 Feb 2001 14:45:09 +0100
In-Reply-To:	<[log in to unmask]>
Reply-To:	Mailing list for the LaTeX3 project <[log in to unmask]>
Parts/Attachments:	text/plain (39 lines)

At 09:28 +0000 2001/02/13, Robin Fairbairns wrote:
>none of these groups has yet "finished the job".  they have some
>interesting results, but probably could not parse the awful english
>that i write.
>
>can we practically hypothecate a markup language that depends on
>comprehending natural language?
>
>surely not, even in the hans aberg universe?

I have no immediate plans of writing a natural language parser, if that is
what you mean.

Otherwise, I stated the general principle, the better the parsing becomes,
the less markup will be needed (or the more sophisticated it can be).

As for that natural language parsing problem, one problem is that humans,
using their massively parallel supercomputers, can scan a sentence and try
many different patterns. Let's try parsing the Frank Mittelbach example:
    The a in the formula is a variable.
You would probably use the context knowledge that it is composed of English
and Math and scan it to recognize that the second "a", but not the first,
is a indefinite article. Then from that, you would infer that the first "a"
must be a math symbol, which is supported by the semantic information of
the wording "in the formula".

But in a computer, as soon as one leaves the non-deterministic parsing, the
parsing becomes slow. -- There is a non-deterministic version of Bison, and
that is reported to be slow on the GCC list, I am told. Also, extracting
semantic information from natural language is difficult for similar
reasons, that is, one has to try a number of different patterns.

So it means that for authoring languages, one will probably be stuck with
deterministic parsing techniques, and corresponding markup. But it is still
so that better grammars will decrease the needed markup, and that will also
help the authors using such a language.

  Hans Aberg

ATOM RSS1 RSS2

LISTSERV.UNI-HEIDELBERG.DE
Universität Heidelberg \| Impressum \| Datenschutzerklärung