[FieldTrip] Calculating the variance of a group with given variances of each subject

Fri Sep 30 13:46:40 CEST 2022

Hi Sebastian,

I would say that for visualization purposes (and for traditional parametric second-level statistics) you are interested in the variance across your units-of-observation. In this case across subjects. This is indeed what ft_timelockgrandaverage gives you. If you are into more sophisticated estimates, it smells a bit as if you want to be moving into the direction of mixed-effect models, which are gaining quite a bit of momentum these days. Since I am a bit old-school I have not eaten any cheese from that (http://www.dwotd.nl/2008/02/360-daar-heb-ik-geen-kaas-van-gegeten.html), so perhaps somebody else wants to chime in here.

Best wishes,
Jan-Mathijs

On 26 Sep 2022, at 21:36, Sebastian Neudek via fieldtrip <fieldtrip at science.ru.nl<mailto:fieldtrip at science.ru.nl>> wrote:

Hi all,

I am currently writing my master's thesis using FieldTrip and might need a bit of your expertise for a little hurdle I encountered.

First a few words about what I want to do:
I have data from a MEG-go/nogo study: There are multiple subjects and multiple trials for each condition. I now want to plot the ERF for each condition averaged across all subjects. But together with the ERF I also want to plot the  standard error or variance or confidence interval or standard deviation. Something like this(Figure 3a):
https://www.researchgate.net/figure/ERP-results-including-P3-amplitudes-a-Grand-average-and-standard-errors-of-ERP_fig2_327651209<https://urldefense.com/v3/__https://www.researchgate.net/figure/ERP-results-including-P3-amplitudes-a-Grand-average-and-standard-errors-of-ERP_fig2_327651209__;!!HJOPV4FYYWzcc1jazlU!4AKDCxWgDNOVGlWwimPV3Sc-5suVMj59sPhwC-eHwI7h0FQr10-YxE5fzliZsQft3R0I3s5Swpei12UfNPYgMdNHwshEvUe7t8p0Gw$>
Independent which of these measures I finally use, I need to calculate the variance of the ERF at each timepoint.
To do this I first used ft_timelockanalysis to calculate the averages of each subject. ft_timelockanalysis returns not only the average, but also the variance of the ERF. Afterwards I use ft_timelockgrandaverage to calculate the group average of the ERF.

And this is where my troubles begin. I get how ft_timelockgrandaverage calulates the new average: It averages over the mean values of the subjects (it also accounts for the degrees of freedom). But for the variance it just calculates the variance of the means. I thought, it will also account for the variances of each subject.
In the extreme case it leads to the following:
Imagine you have two subjects and you measure very close averages avg(sub1)=1 and avg(sub2)=1.0001. But their variances are extremly high: var(sub1)=var(sub2)=100. Then the variance calculated by ft_timelockgrandaverage is almost zero, when in reality it should be something like 100.
Maybe this should be noted in ft_timelockgrandaverage, because it can lead to wrong statistics if the user isn't aware of this effect.

Back to my task:
This is surely not the variance I want to use for my ERF plots and therefore I searched for a different calculation.
I found on wikipedia a formula for so called 'pooled variance' (at the bottom of the page for sample-based statistics):
https://en.wikipedia.org/wiki/Pooled_variance<https://urldefense.com/v3/__https://en.wikipedia.org/wiki/Pooled_variance__;!!HJOPV4FYYWzcc1jazlU!4AKDCxWgDNOVGlWwimPV3Sc-5suVMj59sPhwC-eHwI7h0FQr10-YxE5fzliZsQft3R0I3s5Swpei12UfNPYgMdNHwshEvUeBLHOpZA$>

But I am not 100% happy with this pooled variance.
First of all the deegrees of freedom is not accounted for. Then there is a different mean calculated than in ft_timelockgrandaverage: The mean is accounting for the number of trials for each subject. I don't know if the different mean is a pro or a con (On the one hand, means with a higher precisision because of a higher number of trials are weighted higher than means with a lower trial number. On the other hand, each subject on the group should be weighted same).

I also found a second formula on stackexchange, which also calculates a variances of variances:
https://stats.stackexchange.com/questions/300392/calculate-the-variance-from-variances<https://urldefense.com/v3/__https://stats.stackexchange.com/questions/300392/calculate-the-variance-from-variances__;!!HJOPV4FYYWzcc1jazlU!4AKDCxWgDNOVGlWwimPV3Sc-5suVMj59sPhwC-eHwI7h0FQr10-YxE5fzliZsQft3R0I3s5Swpei12UfNPYgMdNHwshEvUdcUgTl6A$>

Therefore I am uncertain which formula to use, because as it seems, they differ.

I hope somebody of you is a little more experienced in statistics than I am and can help me out finding the correct or best calculation for this variance. Best case would be if it is already implemented in fieldtrip and I only missed it, but I can also implement it myself.

Best,
Sebastian

_______________________________________________
fieldtrip mailing list
https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
https://urldefense.com/v3/__https://doi.org/10.1371/journal.pcbi.1002202__;!!HJOPV4FYYWzcc1jazlU!4AKDCxWgDNOVGlWwimPV3Sc-5suVMj59sPhwC-eHwI7h0FQr10-YxE5fzliZsQft3R0I3s5Swpei12UfNPYgMdNHwshEvUdau-41SA$

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20220930/57b52608/attachment.htm>