[FieldTrip] Calculating the variance of a group with given variances of each subject

Mon Sep 26 21:36:30 CEST 2022

Hi all,

I am currently writing my master's thesis using FieldTrip and might need a bit of your expertise for a little hurdle I encountered.

First a few words about what I want to do:
I have data from a MEG-go/nogo study: There are multiple subjects and multiple trials for each condition. I now want to plot the ERF for each condition averaged across all subjects. But together with the ERF I also want to plot the  standard error or variance or confidence interval or standard deviation. Something like this(Figure 3a):
https://www.researchgate.net/figure/ERP-results-including-P3-amplitudes-a-Grand-average-and-standard-errors-of-ERP_fig2_327651209
Independent which of these measures I finally use, I need to calculate the variance of the ERF at each timepoint.
To do this I first used ft_timelockanalysis to calculate the averages of each subject. ft_timelockanalysis returns not only the average, but also the variance of the ERF. Afterwards I use ft_timelockgrandaverage to calculate the group average of the ERF.

And this is where my troubles begin. I get how ft_timelockgrandaverage calulates the new average: It averages over the mean values of the subjects (it also accounts for the degrees of freedom). But for the variance it just calculates the variance of the means. I thought, it will also account for the variances of each subject.
In the extreme case it leads to the following:
Imagine you have two subjects and you measure very close averages avg(sub1)=1 and avg(sub2)=1.0001. But their variances are extremly high: var(sub1)=var(sub2)=100. Then the variance calculated by ft_timelockgrandaverage is almost zero, when in reality it should be something like 100.
Maybe this should be noted in ft_timelockgrandaverage, because it can lead to wrong statistics if the user isn't aware of this effect.

Back to my task:
This is surely not the variance I want to use for my ERF plots and therefore I searched for a different calculation.
I found on wikipedia a formula for so called 'pooled variance' (at the bottom of the page for sample-based statistics):
https://en.wikipedia.org/wiki/Pooled_variance

But I am not 100% happy with this pooled variance.
First of all the deegrees of freedom is not accounted for. Then there is a different mean calculated than in ft_timelockgrandaverage: The mean is accounting for the number of trials for each subject. I don't know if the different mean is a pro or a con (On the one hand, means with a higher precisision because of a higher number of trials are weighted higher than means with a lower trial number. On the other hand, each subject on the group should be weighted same).

I also found a second formula on stackexchange, which also calculates a variances of variances:
https://stats.stackexchange.com/questions/300392/calculate-the-variance-from-variances

Therefore I am uncertain which formula to use, because as it seems, they differ.

I hope somebody of you is a little more experienced in statistics than I am and can help me out finding the correct or best calculation for this variance. Best case would be if it is already implemented in fieldtrip and I only missed it, but I can also implement it myself.

Best,
Sebastian

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20220926/6de87a60/attachment.htm>