[clean-list] isLower isUpper

Richard A. O'Keefe ok@cs.otago.ac.nz
Wed, 8 May 2002 14:56:06 +1200 (NZST)


"Marco Kesseler" <m.kesseler@aia-itp.com> wrote:
	The question is of course, what codepage Char is in.
	There is no portable way to classify =E4, =E9,=EE, =E7 and =F1
	without knowing what codepage one is dealing with.
	
If the Clean system has any C code in it, there _is_ a portable way.

	setlocale(LC_CTYPE, "");

is supposed to tell the functions in <ctype.h> to use the "native" locale,
however that is defined on the host machine.

Of course this assumes an environment where all the files are in the same
character set, which, though not always true, is at least better than assuming
that everything is in ASCII.

	Hence the need for Unicode.
	
Unfortunately, that does not solve the problem.  The problem is existing
data without any ISO 2022 "announcer" sequence to tell you which registered
character set is used.  Without that information, you can't translate the
data to Unicode, so Unicode doesn't help.

Thanks to all the free tables at www.unicode.org, it's quite easy to
classify characters portably, once you know which character set to use.