[FieldTrip] impact of skewed power distributions on data analysis

Mon Dec 19 23:07:07 CET 2016

I think this paper is relevant to this discussion.

Grandchamp, R., & Delorme, A. (2011). Single-Trial Normalization for
Event-Related Spectral Decomposition Reduces Sensitivity to Noisy
Trials. *Frontiers
in Psychology*, *2*, 236. http://doi.org/10.3389/fpsyg.2011.00236

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3183439/

On 19 December 2016 at 13:08, Teresa Madsen <tmadsen at emory.edu> wrote:

> I appreciate everyone's feedback, but I still wonder if something is being
> missed.  I understand that the non-normally distributed power values may be
> less of an issue when performing non-parametric stats or even a
> paired-samples t-test that looks at difference values which may be normal
> even when the raw data isn't.  However, my concern comes into play even
> before these statistical comparisons are made, whenever any averaging is
> done to freq-type data across times, frequencies, trials, electrodes,
> subjects, etc.  That means any time any of these configuration options are
> used for any of these functions, and probably more:
>
> ft_freqanalysis:          cfg.keeptrials or cfg.keeptapers = 'no';
> ft_freqgrandaverage:   cfg.keepindividual = 'no';
> ft_freqstatistics:         cfg.avgoverchan, cfg.avgovertime, or
> cfg.avgoverfreq = 'yes';
> ft_freqbaseline:          cfg.baseline = anything but 'no'
>
> In each case, if raw power values are averaged, the result will be
> positively skewed.  Maybe it's not a huge problem if all of the data is
> treated identically, but the specific case that triggered my concern was in
> ft_freqbaseline, where the individual time-frequency bins are compared to
> the mean over time for the baseline period.  For example, when using
> cfg.baselinetype = 'db', as Giuseppe Pellizzer suggested, the output freq
> data does indeed have a more normal distribution over time, but the mean
> over the baseline time period is performed *before* the log transform, when
> the distribution is still highly skewed:
>
>   meanVals = repmat(nanmean(data(:,:,baselineTimes), 3), [1 1 size(data,
> 3)]);
>   data = 10*log10(data ./ meanVals);
>
> That's what I had originally done when analyzing data for my SfN poster,
> when I realized the background noise that shouldn't have changed much from
> baseline was mostly showing a decrease from baseline of about -3dB.
>
> Now, I've realized I'm seeing this as more of a problem than others
> because of another tweak I made, which was to use a long, separate baseline
> recording to normalize my trial data, rather than a short pre-trial period
> as ft_freqbaseline is designed to do.  Averaging a few hundred milliseconds
> for a baseline power estimate might be okay because overlapping time points
> in the original data are used to calculate those power values anyway,
> probably making them less skewed, but also (it seems to me) more arbitrary
> and prone to error.  I already offered my custom function BLnorm.m to one
> person who was asking about this issue of normalizing to a separate
> baseline recording, and I would be happy to contribute it to FieldTrip if
> others would appreciate it.
>
> Since a few people suggested using the median, and it is also suggested in Cohen's
> textbook
> <https://mitpress.mit.edu/books/analyzing-neural-time-series-data> as an
> alternative measure of the central tendency for skewed raw power values, I
> wonder if the simplest fix might be to add an option to select mean or
> median in each of the functions listed above.  Another possibility would be
> adding an option to transform the power values upon output from
> ft_freqanalysis.
>
> Would anyone else find such changes useful?
>
> Thanks,
> Teresa
>
>
> On Wed, Dec 14, 2016 at 4:22 AM, Herring, J.D. (Jim) <
> J.Herring at donders.ru.nl> wrote:
>
>> In terms of statistics it is the distribution of values that you do the
>> statistics on that matters. In case of a paired-samples t-test when
>> comparing two conditions, it is the distribution of difference values that
>> has to be normally distributed. The distribution of difference values is
>> often normal given two similarly non-normal distributions, offering no
>> complications for a regular parametric test.
>>
>>
>>
>> The non-parametric tests offered in fieldtrip indeed do not assume
>> normality, so you should have no problem there either.
>>
>>
>>
>>
>>
>> *From:* fieldtrip-bounces at science.ru.nl [mailto:fieldtrip-bounces at scie
>> nce.ru.nl] *On Behalf Of *Alik Widge
>> *Sent:* Tuesday, December 13, 2016 3:10 PM
>> *To:* FieldTrip discussion list <fieldtrip at science.ru.nl>
>> *Subject:* Re: [FieldTrip] impact of skewed power distributions on data
>> analysis
>>
>>
>>
>> In this, Teresa is right and we have observed this in our own EEG data --
>> depending on one's level of noise and number of trials/patients, the mean
>> can be a very poor estimator of central tendency. My students are still
>> arguing about what we really want to do with it, but at least one of them
>> has shifted to using the median as a matter of course for baseline
>> normalization.
>>
>>
>> Alik Widge
>> alik.widge at gmail.com
>> (206) 866-5435
>>
>>
>>
>> On Mon, Dec 12, 2016 at 6:45 PM, Teresa Madsen <tmadsen at emory.edu> wrote:
>>
>> That may very well be true; to be honest, I haven't looked that deeply
>> into the stats offerings yet. However, my plan is to express each
>> electrode's experimental data in terms of change from their respective
>> baseline recordings before attempting any group averaging or statistical
>> testing, and this problem shows up first in the baseline correction step,
>> where FieldTrip averages raw power over time.
>>
>> ~Teresa
>>
>>
>>
>> On Mon, Dec 12, 2016 at 4:56 PM Nicholas A. Peatfield <
>> nick.peatfield at gmail.com> wrote:
>>
>> Correct me if I'm wrong, but, if you are using the non-parametric
>> statistics implemented by fieldtrip, the data does not need to be normally
>> distributed.
>>
>>
>>
>> On 12 December 2016 at 13:39, Teresa Madsen <tmadsen at emory.edu> wrote:
>>
>> No, sorry, that's not what I meant, but thanks for giving me the
>> opportunity to clarify. Of course everyone is familiar with the 1/f pattern
>> across frequencies, but the distribution across time (and according to the
>> poster, also across space), also has an extremely skewed, negative
>> exponential distribution. I probably confused everyone by trying to show
>> too much data in my figure, but each color represents the distribution of
>> power values for a single frequency over time, using a histogram and a line
>> above with circles at the mean +/- one standard deviation.
>>
>> My main point was that the mean is not representative of the central
>> tendency of such an asymmetrical distribution of power values over time.
>> It's even more obvious which is more representative of their actual
>> distributions when I plot e^mean(logpower) on the raw plot and
>> log(mean(rawpower)) on the log plot, but that made the figure even more
>> busy and confusing.
>>
>> I hope that helps,
>> Teresa
>>
>>
>>
>> On Mon, Dec 12, 2016 at 3:47 PM Nicholas A. Peatfield <
>> nick.peatfield at gmail.com> wrote:
>>
>> Hi Teresa,
>>
>>
>>
>> I think what you are discussing is the 1/f power scaling of the power
>> spectrum. This is one of the reasons that comparisons are made within
>> a band (i.e. alpha to alpha) and not between bands (i.e. alpha to gamma),
>> as such the assumption is that within bands there should be a relative
>> change against baseline and this is what the statistics are performed on.
>> That is, baseline correction is assumed to be the mean for a specific
>> frequency and not a mean across frequencies.
>>
>>
>>
>>  And this leads to another point that when you are selecting a frequency
>> range to do the non-parametric statistics on you should not do 1-64 Hz but
>> break it up based on the bands.
>>
>>
>>
>> Hope my interpretation of your point is correct. I sent in individually,
>> as I wanted to ensure I followed your point.
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Nick
>>
>>
>>
>>
>>
>> On 12 December 2016 at 08:23, Teresa Madsen <tmadsen at emory.edu> wrote:
>>
>> FieldTrippers,
>>
>>
>>
>> While analyzing my data for the annual Society for Neuroscience meeting,
>> I developed a concern that was quickly validated by another poster (full
>> abstract copied and linked below) focusing on the root of the problem:
>>  neural oscillatory power is not normally distributed across time,
>> frequency, or space.  The specific problem I had encountered was in
>> baseline-correcting my experimental data, where, regardless of
>> cfg.baselinetype, ft_freqbaseline depends on the mean power over time.
>> However, I found that the distribution of raw power over time is so skewed
>> that the mean was not a reasonable approximation of the central tendency of
>> the baseline power, so it made most of my experimental data look like it
>> had decreased power compared to baseline.  The more I think about it, the
>> more I realize that averaging is everywhere in the way we analyze neural
>> oscillations (across time points, frequency bins, electrodes, trials,
>> subjects, etc.), and many of the standard statistics people use also rely
>> on assumptions of normality.
>>
>>
>>
>> The most obvious solution for me was to log transform the data first, as
>> it appears to be fairly log normal, and I always use log-scale
>> visualizations anyway.  Erik Peterson, middle author on the poster, agreed
>> that this would at least "restore (some) symmetry to the error
>> distribution."  I used a natural log transform, sort of arbitrarily to
>> differentiate from the standard decibel transform included in FieldTrip as
>> cfg.baselinetype = 'db'.  The following figures compare the 2 distributions
>> across several frequency bands (using power values from a wavelet
>> spectrogram obtained from a baseline LFP recorded in rat prelimbic
>> cortex).  The lines at the top represent the mean +/- one standard
>> deviation for each frequency band, and you can see how those descriptive
>> stats are much more representative of the actual distributions in the log
>> scale.
>>
>>
>>
>>
>> 
>>
>> For my analysis, I also calculated a z-score on the log transformed power
>> to assess how my experimental data compared to the variability of the noise
>> in a long baseline recording from before conditioning, rather than a short
>> pre-trial baseline period, since I find that more informative than any of
>> FieldTrip's built-in baseline types.  I'm happy to share the custom
>> functions I wrote for this if people think it would be a useful addition to
>> FieldTrip.  I can also share more about my analysis and/or a copy of the
>> poster, if anyone wants more detail - I just didn't want to make this email
>> too big.
>>
>>
>>
>> Mostly, I'm just hoping to start some discussion here as to how to
>> address this.  I searched the wiki
>> <http://www.fieldtriptoolbox.org/development/zscores>, listserv
>> <https://mailman.science.ru.nl/pipermail/fieldtrip/2006-December/000773.html>
>>  archives
>> <https://mailman.science.ru.nl/pipermail/fieldtrip/2010-March/002718.html>,
>> and bugzilla <http://bugzilla.fieldtriptoolbox.org/show_bug.cgi?id=1574> for
>> anything related and came up with a few topics surrounding normalization
>> and baseline correction, but only skirting this issue.  It seems important,
>> so I want to find out whether others agree with my approach or already have
>> other ways of avoiding the problem, and whether FieldTrip's code needs to
>> be changed or just documentation added, or what?
>>
>>
>>
>> Thanks for any insights,
>>
>> Teresa
>>
>>
>>
>>
>> 271.03 / LLL17 - Neural oscillatory power is not Gaussian distributed
>> across time
>> <http://www.abstractsonline.com/pp8/#!/4071/presentation/24150>
>>
>> *Authors*
>>
>> **L. IZHIKEVICH*, E. PETERSON, B. VOYTEK;
>> Cognitive Sci., UCSD, San Diego, CA
>>
>> *Disclosures*
>>
>>  *L. Izhikevich:* None. *E. Peterson:* None. *B. Voytek:* None.
>>
>> *Abstract*
>>
>> Neural oscillations are important in organizing activity across the human
>> brain in healthy cognition, while oscillatory disruptions are linked to
>> numerous disease states. Oscillations are known to vary by frequency and
>> amplitude across time and between different brain regions; however, this
>> variability has never been well characterized. We examined human and animal
>> EEG, LFP, MEG, and ECoG data from over 100 subjects to analyze the
>> distribution of power and frequency across time, space and species. We
>> report that between data types, subjects, frequencies, electrodes, and
>> time, an inverse power law, or negative exponential distribution, is
>> present in all recordings. This is contrary to, and not compatible with,
>> the Gaussian noise assumption made in many digital signal processing
>> techniques. The statistical assumptions underlying common algorithms for
>> power spectral estimation, such as Welch's method, are being violated
>> resulting in non-trivial misestimates of oscillatory power. Different
>> statistical approaches are warranted.
>>
>>
>>
>> --
>>
>> Teresa E. Madsen, PhD
>> Research Technical Specialist:  *in vivo *electrophysiology & data
>> analysis
>>
>> Division of Behavioral Neuroscience and Psychiatric Disorders
>> Yerkes National Primate Research Center
>>
>> Emory University
>>
>> Rainnie Lab, NSB 5233
>> 954 Gatewood Rd. NE
>> Atlanta, GA 30329
>>
>> (770) 296-9119
>>
>> braingirl at gmail.com
>>
>> https://www.linkedin.com/in/temadsen
>>
>>
>>
>>
>>
>> _______________________________________________
>> fieldtrip mailing list
>> fieldtrip at donders.ru.nl
>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>
>>
>>
>>
>>
>> --
>>
>> Nicholas Peatfield, PhD
>>
>>
>>
>>
>>
>>
>>
>> --
>>
>> Nicholas Peatfield, PhD
>>
>>
>>
>>
>> _______________________________________________
>> fieldtrip mailing list
>> fieldtrip at donders.ru.nl
>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>
>>
>>
>> _______________________________________________
>> fieldtrip mailing list
>> fieldtrip at donders.ru.nl
>> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>>
>
>
>
> --
> Teresa E. Madsen, PhD
> Division of Behavioral Neuroscience and Psychiatric Disorders
> Yerkes National Primate Research Center
> Emory University
> Rainnie Lab, NSB 5233
> 954 Gatewood Rd. NE
> Atlanta, GA 30329
> (770) 296-9119
>
> _______________________________________________
> fieldtrip mailing list
> fieldtrip at donders.ru.nl
> https://mailman.science.ru.nl/mailman/listinfo/fieldtrip
>

-- 
Nicholas Peatfield, PhD
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20161219/ce0c873c/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 38279 bytes
Desc: not available
URL: <http://mailman.science.ru.nl/pipermail/fieldtrip/attachments/20161219/ce0c873c/attachment-0001.png>