[clean-list] isLower isUpper

Marco Kesseler m.kesseler@aia-itp.com
Wed, 8 May 2002 16:19:47 +0200


From: <F.S.A.Zuurbier@inter.nl.net>
> Would th following be possible to solve the prolem?
...
> class FileSystem f where
> fopenASCII :: !{#Char} !Int !*f -> (!Bool,!*File A,!*f)
> fopenUnicode :: !{#Char} !Int !*f -> (!Bool,!*File U,!*f)
> fopenMac :: !{#Char} !Int !*f -> (!Bool,!*File M,!*f)
> fopenWindows :: !{#Char} !Int !*f -> (!Bool,!*File W,!*f)
...
> The functions that triggered this thread would get the types:
>
> isUpper :: (!Char c) -> Bool
> isLower :: (!Char c) -> Bool
>
> They would be methods in a class, with instances for all code pages.
>
> This way, it should be possible to use the type system to process
different character code pages correctly. I would rather have that code page
information would be part of the file information, but this, I guess, is not
reality.

There are far more code pages than just Windows, Mac, ASCII and Unicode.
Having class instances for all of them is quite cumbersome. Furthermore,
some codepages are subsets of others, so you probably want to have a system
that is able to exploit this. This also seems to be a rather static
solution: if you have built an application and forgot to include support for
a particular code page, your customers cannot just extend its functionality
by supplying or changing a table. And finally, insisting on having
single-byte characters internally makes your system more restrictive than is
necessary.

> How does this work out on the screen? Clean already has a module
deltaSystem.icl that contains platform-dependent stuff such as
WinGetHorzResolution. Maybe that would be the place to put a platform's
display code page type (i.e. A, U, M, or W).
>
> I realize that changing the types of File and Char may not be really
necessary. We already stuff an object of type File with information that
controls subsequent read and write operations: FReadData etc. But that way,
programming errors generate a runtime error, and are not caught at compile
time.

I think the actual problem is that someone supplies a file with single-byte
characters in another codepage than the system expects. It will never be
possible to check this at compile time.

I think that it is far easier to convert single-byte files to Unicode at the
boundaries of your system, and have Unicode internally. See for example how
XML deals with this.

regards,
Marco

----------------------------------------------------------------------
Aia Software B.V.                     Phone :  +31 24 371 02 30
PO Box 38025                          Fax   :  +31 24 371 02 31
6503 AA Nijmegen                      URL   :  http://www.aia-itp.com
The Netherlands
----------------------------------------------------------------------
This E-mail and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this E-mail in error please notify
the postmaster (postmaster@aia-itp.com). The authenticity of this
message cannot, at this moment, be guaranteed by ourselves. For this
reason no legal rights may be granted should the contents differ to
the original sent message. The Aia log-file of sent messages is deemed
to be the sole, true transcript of communication unless the contrary,
other than the received message, can be proven.
----------------------------------------------------------------------