[FieldTrip] Question about cluster-based statistical testing (sum of t-stats or suprathreshold t-stats?)

Fri Jan 4 15:50:21 CET 2013

Dear Artemy,

Importantly, these constants also enter in the permutation distribution
that is used to evaluated the significance of the maximum cluster-mass
statistic, to the effect that the Bullmore-style and the Fieldtrip-style
permutation distributions are shifted versions of each other. As a result,
the p-values that roll out of the two approaches are identical.

If I understand correctly, having the same resulting p-values could only
be if the two methods assign the same rank-ordering to a given a set of
clusters.  But I don't think that is the case. Let's imagine that the
t-statistic cutoff 'c' is equal to 1, and the data contains two
suprathreshold clusters (let's say this is a spatial test and the clusters
are composed of electrodes):

- The first cluster has 10 electrodes, each one with a t-statistic equal
to 1.1

- The second cluster has 2 electrodes, both with a t-statistic equal to 3

As I understand, Bullmore's method would assign cluster 1 a mass of
10*(1.1-1) = 1 and cluster 2 a mass of 2*(3-1)=4 , while your method would
assign cluster 1 a mass of 10*1.1 = 11 and cluster 2 a mass of 2*3 = 6.
Hence, given a null distribution, it should be possible to choose a
cluster-based threshold that indicates as significant only cluster 1 under
Bullmore's method, and only cluster 2 under yours.

I think your reasoning is correct: when the data contain more than one
suprathreshold cluster, my argument does not apply anymore. Your example
shows that the Bullmore- and Fieldtrip-style cluster statistics have
different sensitivities. Thank you for pointing this out. For every test
statistic, the decisions based on the permutation p-value controls the
type-I error rate, but the type-II error rate (the complement of
sensitivity) depends on the exact test statistic.

Thanks for confirming that up. I should note, though, that this is an
issue even in data without multiple suprathreshold clusters.  The same
logic as above -- which shows that the two measures gives different ranks
to same set of clusters -- also applies to the distribution of clusters
under the null hypothesis.  Thus one can imagine a single cluster in the
data that would be judged significant under Fieldtrip's method and not
significant under Bullmore, or vice-versa.  I believe that generally, in
comparison to Bullmore's method, Fieldtrip's method would tend to favor
judging-as-significant large clusters (with many electrodes).  

Sure, it also applies to the distribution of clusters under the null
hypothesis and, therefore, can result in different p-values for the
Fieldtrip and the Bullmore style cluster statistics, even when there is
only single cluster in the data. I should have made this more explicit in
my reply, especially since the distribution of clusters under the null
hypothesis is random (and therefore a focus on the
single-cluster-in-the-data situation is not a very useful one).

I personally think of the distinction not so much in terms of controlling
sensitivity, but rather as concerning the definition of what counts as a
cluster of interest.  Though both methods look for spatiotemoprally
contiguous regions of electrodes that exceed threshold, for Bullmore the
cluster is the sum of suprathreshold statistic values , while for
Fieldtrip it's the sum of the entire statistic values in the region.  I'm
quite interested in the question of which gives more justifiable/better
results in real-world settings, though unfortunately I have not seen any
work done on the matter.  From what I have seen in my brief forays into
the extensive analytic + numerical studies of cluster-based significance
testing in the fMRI literature, in that field they always refer to
Bullmore-style clusters.

I do think that sensitivity is a crucial concept here. Neurobiological
data are almost always high-dimensional (data arrays with dimensions
space, time and/or frequency), but the statistical test only answers the
question whether there is some direction in this high-dimensional space
along which the experimental conditions differ. It is crucial that our
test statistics are chosen such that they have a high sensitivity in the
directions that are neurobiologically plausible. In a recent paper in
Psychophysiology (2012), I took some time to explain this issue.

Note that others in the field (e.g., Karl Friston) often put forward the
claim that parametric statistical tests (e.g., the t-statistic) are always
more sensitive than nonparametric statistical tests. This claim only holds
for scalar observations (e.g., an electrical potential measured at one
electrode and one post-stimulus time), which is not the type of data
neurobiological studies are typically interested in.

Thanks for your interest in this issue. You ask the right questions. Also,
my apologies for my earlier sloppy replies.

Eric

Thanks again & happy holidays...

-Artemy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20130104/f69462f7/attachment.html>