[FieldTrip] Why does the output from cluster-based permutation tests show a p-value for each cluster, then the test really only has one p-value?

Stephen Politzer-Ahles politzerahless at gmail.com
Sun Feb 22 15:54:12 CET 2026


 Dear fieldtrip community,

(It's been quite a while since I was active on this list, so I apologize if
this has already been discussed and I just missed it!)

When you do a cluster-based permutation test using ft_timelockstatistics
(and I assume also with ft_freqstatistics), one of the fields of the
output, .prob, shows multiple p-values: one for each cluster that was
identified in the initial dataset.

But a cluster-based permutation test actually has only one test statistic.
As described in Maris & Oostenveld (2007)
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.sciencedirect.com%2Fscience%2Farticle%2Fpii%2FS0165027007001707&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C8d433a882c9743df032708de72224617%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639073688693973076%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=TQY%2FKpLePg%2B4U0v6XBXJOz%2FwQ%2FEVb1aEQcGguLw8cfA%3D&reserved=0>, the
procedure is to find clusters and then just take the largest cluster-level
test statistic; if the function follows this procedure, then presumably all
the other cluster-level test statistics might as well just disappear into
the aether, because they serve no additional purpose (as far as I am
aware), they're just an intermediate step in the analysis. The cluster-based
permutation test tutorial
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fieldtriptoolbox.org%2Ftutorial%2Fstats%2Fcluster_permutation_timelock%2F&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C8d433a882c9743df032708de72224617%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639073688693990869%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=s53jcYemY2mqADC9DskDlUQB4teMPUAlkjXV%2FdBch0c%3D&reserved=0>
on the fieldtrip wiki is also explicit (in two separate places!) in saying
that "for the statistical inferential decision (i.e. reject or not the null
hypothesis), *only the p-value of the largest cluster is relevant*"
(emphasis added).

I suspect this may have led to some confusion, at least in my area of
research (neurolinguistics). Over the past 15ish years there have been many
papers reporting findings along the lines of "this contrast yielded two
significant clusters, one here and one there!", and I suspect much of this
came from people seeing multiple *p*>.05 entries in their stats.prob array
(I did some of this myself back in the day when I was just beginning to use
these tests). Granted, there has been information out for over a decade
saying not to do that (e.g., How not to interpret results from a
cluster-based permutation test
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.fieldtriptoolbox.org%2Ffaq%2Fstats%2Fclusterstats_interpretation%2F&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C8d433a882c9743df032708de72224617%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639073688694006468%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=1kPCUlyIYV1ePqWCC27nuLQhQZmnKXhXDSmehEKv1yQ%3D&reserved=0>
on the fieldtrip wiki, and Sassenhagen et al. [2019]
<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fonlinelibrary.wiley.com%2Fdoi%2F10.1111%2Fpsyp.13335&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C8d433a882c9743df032708de72224617%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639073688694024063%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vDbjRztL5O%2BuDBVDhZvkgXljtUMUoLwbsGchZ8NtZ%2FI%3D&reserved=0>; these are more
focused on a slightly different framing of the issue [they're more about
how we shouldn't use these tests to make claims about where the
significance of the effect begins and ends] but they also entail this same
point [the test doesn't license conclusions about there being multiple
clusters, for the same reasons it doesn't license conclusions about where a
given cluster begins and ends]). But not all users read all of this before
just running the code and seeing p-values and going with them. And, sure I
get that it's a user's responsibility to RTFM, but the code seems to be
really tempting people to look at those other p-values, whereas all the
information telling people they shouldn't look at those p-values is much
harder to find. (Also I recognize that fieldtrip is not the only software
out there that does these tests, but as far as I know it was the first, and
I think a lot of people using these first learned from fieldtrip.)

So, with all that in mind, I'm just wondering why fieldtrip outputs these
extra p-values at all? Do they serve some other purpose that I'm just
missing? (The tutorial's wording "*for the statistical inferential
decision* ...
only the p-value of the largest cluster is relevant" seems to imply that
there may be some *other* purpose for which the other p-values are
relevant, but I'm not sure that reading was intended, or what that other
purpose may be.)

Thank you!
Steve


---
e-mail signature <https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpolitzerahles.github.io%2Fe-mail-signature.html&data=05%7C02%7Cfieldtrip%40science.ru.nl%7C8d433a882c9743df032708de72224617%7C084578d9400d4a5aa7c7e76ca47af400%7C1%7C0%7C639073688694040763%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=Jitx0FUkMPbIbLxDqSa6xwZu6Hp5T6erqCIZ66a5JBY%3D&reserved=0>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20260222/a8201b61/attachment.htm>


More information about the fieldtrip mailing list