[clean-list] Wish list, part 3

Richard A. O'Keefe ok@cs.otago.ac.nz
Fri, 12 Apr 2002 12:54:24 +1200 (NZST)


Someone wrote:
	
	> > - The clean rules for identifiers are such that they can either
	> > consist of ordinary characters, or of some special characters, but
	> > not both. Otherwise the compiler has trouble interpreting things like
	> > 'n+1'. In my opinion this is a pity, because:
	> > a) I have no trouble writing 'n + 1' (i.e. with spaces, which I often
	> > do anyway)
	> > b) I prefer a simple rule that all symbols are separated by
	> > whitespace (and brackets)
	> > c) I would very much like to markup my symbols like >add<,  >sub< for
	> > dyadic operators, or symbol+, symbol*, symbol? for parsing functions,
	> > or @node for labels, or <elem> and </elem> for constructors.
	> > d) I don't believe that things like (<?@) make your code any more
	> > readable (taken from parser combinator stuff).

And someone else wrote:
	
	There may be a solution where n+1 would mean what it means today
	in Clean - OR - it would refer to an object named n+1, depending on
	whether such an object is in the name space, and depending on which of
	these alternative meanings typecheck.

YOW!  Are we in never-never land here, or in a world where programming
is done by fallible human beings?  Clean is supposed to be useful for
writing real programs; a notation where you can't tell where the words
begin and end without a map would be disastrous.

There's only one language family I use regularly where "all symbols are
separated by whitespace (and brackets)", and that's Lisp.  Lisp can get
away with it because it has *no* infix operators.  x+y might as well be
a symbol because there isn't anything else it could be.

At the last count I had used over 120 different programming languages,
including some without operators, some with a fixed set of operators,
and some with user-defined operators.

There is precisely one language that I've come across that lets you have
+ operators, including user-defined operators,
AND
+ tokens containing both letters and operator symbols.
That's Pop-11.  If I recall correctly, the syntax is like this:

    <word> ::= <syllable 1> {'_' <syllable n>}*
    <syllable 1> ::= <letter> {<letter> | <digit>}*
		  |  <operator character>+
    <syllable n> ::= {<letter> | <digit>}+
		  |  <operator character>+

That made tokens like symbol_+, symbol_*, and symbol_? available,
while in symbol+..., symbol*..., and symbol?.... the letters and the
would have to be in separate tokens, making x+1 unamiguously three
tokens and x_+ 1 unambiguously two tokens.

The only thing I personally would change in Clean's syntax is to
change 'if e1 e2 e3' to 'if e1 then e2 else e3'.  This would actually
reduce the number of tokens I have to write, because I usually have
to wrap parentheses around the expressions so they are parsed correctly.
Clean's list syntax is _perfect_, and I wish Haskell would adopt it.

I suggest that the remedy for anyone who doesn't like Clean's syntax,
especially someone who is happy writing combinator-based parsers, is
to write a preprocessor to convert the syntax they do like to Clean.
(And of course share it to build up the number of 'converts'.)