[FieldTrip] skewed classes for pattern classification

Thu Jun 14 11:40:55 CEST 2012

Dear Tim,

You pose a very interesting question, that is not asked enough in my
opinion :). There are indeed various approaches to classification and
evaluation of ill-balanced data.

One option is indeed resampling, which is a way to fake a per-instance
importance weight. Some classifiers implement this instance weight
directly, such as libsvm. I am not familiar enough with the DMLT
methods in FieldTrip to know if similar options exist. BTW, I am not
sure that resampling is needed for testing, and if it is, it can
perhaps be performed in the performance metric. This might reduce the
amount of 'tossing data'.

Another option is to use a performance metric that is insensitive to
class imbalance, such as the AUC-ROC or mutual information. Note that
this only solves your evaluation problem; if the classifiers objective
is to minimize a form of loss that *is* sensitive to imbalance, you
probably would still get constant prediction of the most frequent
class.

Best regards,

Boris

On Wed, Jun 13, 2012 at 9:01 PM, Tim Curran <tim.curran at colorado.edu> wrote:
> I am trying some pattern classification in fieldtrip.
>
> Mostly using:
> cfg.method = 'crossvalidate';
> cfg.mva = {dml.standardizer dml.glmnet('family','binomial')};
>
> I am trying to predict task accuracy (0,1) in cases where accuracy is often
> around 80%, so the frequency of the correct class is much greater than the
> frequency of the incorrect class.
>
>
> dml.crossvalidate can handle this as follows:
> %   In order to balance the occurrence of different classes one may set
> %   'resample' equal to true (default: false). Resample will upsample less
> %   occurring classes during training and downsample often occurring
> %   classes during testing.
>
> … but this requires tossing a lot of data in the downsampling process.
>
>
> Has anybody tried other approaches for dealing with skewed classes that do
> not involve downsampling?  Like this for example:
>
> Loss functions allowing for unbalanced classes
> The classification performance is always evaluated by some loss
> function, see the section Estimation of the generalization
> error. Typical examples are the 0/1-loss (i.e., average number of
> misclassified samples) and the area under the receiver operator
> characteristic (ROC) curve (Fawcett, 2006). When using misclassification
> rate, it must be assured that the classes have approximately
> the same number of samples. Otherwise, the employed performance
> measure has to consider the different class prior probabilities.
> For instance, in oddball paradigms the task is to discriminate
> brain responses to an attended rare stimulus from responses to a
> frequent stimulus. A typical ratio of frequent-to-rare stimuli is
> 85:15. In such a setting, an uninformative classifier which
> always predicts the majority class would obtain an accuracy of 85%.
> Accordingly, a different loss function needs to be employed. Denoting
> the number of samples in class i by ni, the normalized error can be
> calculated as weighted average, where errors committed on samples
> of class i are weighted by N/ni with N =Σk nk:
>
> From
> S. Lemm, B. Blankertz, T. Dickhaus, K. R. Müller,
> Introduction to machine learning for brain imaging
> NeuroImage, 56:387-399, 2011
> http://doc.ml.tu-berlin.de/bbci/publications/LemBlaDicMue11.pdf
>
>
> Not being a very good programmer, I got lost in the code before I could find
> the relevant cost function to apply normalization.
>
> Any advice on these issues would be much appreciated.
>
> thanks
> Tim
>
>
>
> _______________________________________________
> fieldtrip mailing list
> fieldtrip at donders.ru.nl
> http://mailman.science.ru.nl/mailman/listinfo/fieldtrip

-- 
twitter.com/#!/breuderink | github.com/breuderink | borisreuderink.nl