[clean-list] Re:(digest): Reading Files in Clean

Siegfried Gonzi siegfried.gonzi@kfunigraz.ac.at
Sun, 26 Aug 2001 13:28:51 +0200


>
> From: Siegfried Gonzi <siegfried.gonzi@kfunigraz.ac.at>

> In C I would use "scanf" with the appropiate format-conversions in a loop. But how in Clean. I think such an example is of interest for all. What about the commas in the file?

My investigations so far. I post it here because it can be of interest for other Clean users too. A file of the following form can be readed and stored into an array:

12.23    12.34    12.23
12.34    56.45    23.45
23.6    3.4    1.003
...        ...        ...

My trial and error program:

module readfile1
import StdEnv,StdFile

readfilee::  !Int  {*{#Real}} !File -> !Real
readfilee  n marray file
     | n== 102000 = marray.[0,0] // Change the dimension n here; see also marray!
         #! (b1,c1,file) = sfreadr file
         #! (b2,c2,file) = sfreadr file
         #! (b3,c3,file) = sfreadr file
     | b1 && b2 && b3 = readfilee (n+1) {marray & [n,0] = c1, [n,1] = c2, [n,2] =c3} file


Extract:: !String {*{#Real}} !*Files -> (!Real,!*Files)
Extract inputfile marray files
     # (readok, infile,file) = sfopen inputfile FReadText files
     | not readok = abort ("Cannot read")
     | otherwise = ( (readfilee 0  marray infile), file)

inputfile :== "input.txt"

//Start:: *World -> {*{#Real}}
Start:: *World  -> (!Real,*World)
Start world  = accFiles (Extract inputfile marray) world
where
     marray = {{0.0\\x <- [0..2]}\\y <- [0..102000]}




This program read and stores the floating-point data into an array (for convenience I print only one value out). I tried it with huge datas: 100000 lines of data and 3 columns (yes I have to deal
with such big datas). Reading and storing takes 16sec on my Mac (Yorick does it in 10sec; though Yorick is trimmed for data-evaluations).

But there remains questions:

--Would be such a construct: Start:: "*World -> !Real (or {*{#Real}})" also possible? "accFiles" requires that the output is bound to the world: (!Real,*World) or ({*{#Real}},*World).

--Is there an array index in Clean which is running faster (in C the column index is running faster; in Fortran the row index)? The manual writes that array access is in constant time; but is this
valid for both of the two dimensions?

--Clean users should be aware that for example: if you access a file and you want to read floating-point numbers but the file holds characters instead so Clean will report you that the pattern
does not match. Do not be confused, because Clean does not tell you that the pattern in the file does not match. First I wanted to read (only for my curiosity to see what will happen):

12.332,12.34,12.34
34.4,55.344,23.23


First I thought my program is not correct, but then I realized that the file pattern itself is not appropiate.


S. Gonzi