[clean-list] UNIX (or POSIX) I/O

Marco Kesseler m.kesseler@xs4all.nl
Sun, 23 May 2004 13:03:15 +0200


Hi,

This problem is caused by the laziness of the (1 + state) expression.
Clean does not need the value of this expression at the moment it
returns the World value. The result of this, is that your heap will get
filled with the expression:

0 + 1 + 1 + ...

If you run a heap profile, you will see that the heap is basically full
of "+".

This can be solved for example by making the state result strict:
process :: *File *World -> (!Int, (Bool, *World))

I find that dealing with unexpected memory behaviour is one of the most
difficult things in Clean.

Regards,
Marco

> -----Original Message-----
> From: clean-list-admin@cs.kun.nl 
> [mailto:clean-list-admin@cs.kun.nl] On Behalf Of Maks Verver
> Sent: vrijdag 21 mei 2004 18:24
> To: 'Donn Cave'
> Cc: clean-list@cs.kun.nl
> Subject: RE: [clean-list] UNIX (or POSIX) I/O
> 
> 
> Hi again,
> 
> In my testcase the memory leak went away when I did not 
> return the *World object back to the main rule. For example, 
> the following code works correctly (and returns the number of 
> lines in a text file):
> 
> ------ 8< ---------------
> 
> import StdEnv
> 
> Start world 
> # (success, file, world) = fopen "D:\\Temp\\Track.sql" FReadText world
> | success = snd (process file world)
> 
> where
> 	process file world
> 	# (file, state) = readlines file 0
> 	= ( fclose file world, state )
> 
> 	readlines file state
> 	# (eof, file) = fend file
> 	| eof = ( file, state )
> 	# (_, file) = freadline file
> 	= readlines file (1+state)
> 
> ---------------- >8 -----
> 
> Now, if I remove the application of 'snd' so not just the 
> file size but also the *World object will be printed the 
> application, so the Start rule
> becomes:
> 
>     Start world 
>     # (success, file, world) = fopen "D:\\Temp\\Track.sql" 
> FReadText world
>     | success = process file world
> 
> For small files, the program returns something like 
> ((True,65536), 3) for a three-line input file, but for large 
> files the program runs out of heap space, except when I set 
> the heap size pretty high and stack size pretty low, in which 
> case a stack overflow occurs before the heap is full. I think 
> that the compiler doesn't generate the correct code for the 
> tail recursive call in 'readlines' for some reason. I have no 
> idea how this works internally or why this happens.
> 
> By the way, the 'working' version is pretty fast. It takes 
> 7.5 seconds to count the 2,864,468 lines in a 271,822,936 
> byte file; that's slightly less than Perl on the same 
> machine, so I guess it's pretty competitive. No need to 
> re-implement it for performance reasons.
> 
> Kind regards,
> Maks Verver.
> 
> 
> 
> 
> _______________________________________________
> clean-list mailing list
> clean-list@cs.kun.nl http://www.cs.kun.nl/mailman/listinfo/clean-list
>