Parser Combinators

Erik Zuurbier F.S.A.Zuurbier@inter.nl.net
Sun, 14 Dec 1997 18:52:48 +0100


I am sending this message again: no answers the first time.
This is the last time, I promise :-)
============
I worked with the Parser Combinators as described in the Clean Book
currently being developed. One combinator is 'twopass', that takes a parser
for a token, and a parser to transform a list of such tokens to the desired
output.

In a very liberal sense of the word, twopass is not lazy: the first list of
tokens that is fed to the second parser is the longest list that can be
derived from the input list. If the input list happens to be infinitely
long, and it also happens to contain valid tokens only, twopass will
construct it, and a 'heap full' run-time error occurs, even if the final
result would be happy with say the first ten tokens.

The twopass combinator makes use of the <*> transformer (also in the book)
to construct a list of tokens. The way it works dictates that the longest
possible list is tried first. But <*> can be changed so that the shortest
possible list is tried first. That makes twopass 'lazy', but it introduces
a terrible inefficiency. Say that the final result of the parser needs the
tokens [1,2,3,4,5,6,7,8,9]. Then twopass will first construct the list []
and feed that to the second parser. That fails. Then it tries [1], [1,2],
[1,2,3] etc. They are all consecutively processed by the second parser from
the bottom up. Only the last one succeeds. The time complexity is O(n*n) I
guess.

In other words, does somebody have a definition of twopass that is 'lazy' in
the liberal sense used above, that can reach a time complexity of O(n) and that
is as sophisticated as the version in the book, in the sense that it is able to
find as many successes (in the 'list of successes')?

Regards Erik Zuurbier