[FieldTrip] skewed classes for pattern classification

Wed Jun 13 21:01:28 CEST 2012

I am trying some pattern classification in fieldtrip.

Mostly using:
cfg.method = 'crossvalidate';
cfg.mva = {dml.standardizer dml.glmnet('family','binomial')};

I am trying to predict task accuracy (0,1) in cases where accuracy is often around 80%, so the frequency of the correct class is much greater than the frequency of the incorrect class.

dml.crossvalidate can handle this as follows:
%   In order to balance the occurrence of different classes one may set
%   'resample' equal to true (default: false). Resample will upsample less
%   occurring classes during training and downsample often occurring
%   classes during testing.

… but this requires tossing a lot of data in the downsampling process.

Has anybody tried other approaches for dealing with skewed classes that do not involve downsampling?  Like this for example:

Loss functions allowing for unbalanced classes
The classification performance is always evaluated by some loss
function, see the section Estimation of the generalization
error. Typical examples are the 0/1-loss (i.e., average number of
misclassified samples) and the area under the receiver operator
characteristic (ROC) curve (Fawcett, 2006). When using misclassification
rate, it must be assured that the classes have approximately
the same number of samples. Otherwise, the employed performance
measure has to consider the different class prior probabilities.
For instance, in oddball paradigms the task is to discriminate
brain responses to an attended rare stimulus from responses to a
frequent stimulus. A typical ratio of frequent-to-rare stimuli is
85:15. In such a setting, an uninformative classifier which
always predicts the majority class would obtain an accuracy of 85%.
Accordingly, a different loss function needs to be employed. Denoting
the number of samples in class i by ni, the normalized error can be
calculated as weighted average, where errors committed on samples
of class i are weighted by N/ni with N =Σk nk:

From
S. Lemm, B. Blankertz, T. Dickhaus, K. R. Müller,
Introduction to machine learning for brain imaging
NeuroImage, 56:387-399, 2011
http://doc.ml.tu-berlin.de/bbci/publications/LemBlaDicMue11.pdf

Not being a very good programmer, I got lost in the code before I could find the relevant cost function to apply normalization.

Any advice on these issues would be much appreciated.

thanks
Tim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20120613/63de6a09/attachment.htm>