# [FieldTrip] Performing statistics based on phase/amplitude correlation

Eelke Spaak eelke.spaak at donders.ru.nl
Fri Mar 18 13:08:25 CET 2011

```Dear FieldTrip community,

At the moment, I am analysing some data to see whether a correlation
exists between phase and amplitude over different frequencies, and in
different channels. Computing the signals of interest (i.e., the phase
and amplitude time series) works fine, and computing the correlation
is also easily done with Matlab's corrcoef function. Now comes the
tricky part. I would like to use the FT statistics framework to assess
the significance of the found correlation, and use clustering in
frequency-frequency space to correct for multiple comparisons. I
understand that this requires a statfun_correlationPearson (or
something like it), and I do not think I'll encounter any problems
implementing this. Before I start doing so, however, I am not sure how
to pass the design and the data to the higher-level statistics
functions, such as ft_freqstatistics.

My data structure is custom-imported, and contains a single 100s-long
trial, on which the phase/amplitude analysis should be conducted:

data =
hdr: [1x1 struct]
fsample: 1000
trial: {[24x100000 double]}
time: {[1x100000 double]}
label: {24x1 cell}
cfg: [1x1 struct]

The phase and amplitude time series are generated by convolution with
Morlet wavelets using ft_freqanalysis:

% perform wavelet convolution for amplitude data
cfg = [];
cfg.channel = chansAmpl;
cfg.method = 'wavelet';
cfg.output = 'fourier';
cfg.keeptrials = 'yes';
cfg.width = 7;
cfg.gwidth = 4;
cfg.foi = freqAmpl;
cfg.toi = data.time{1}; % compute estimate for each sample
tfr = ft_freqanalysis(cfg, data);

% perform wavelet convolution for phase data
cfg.channel = chansPh;
cfg.foi = freqPh;
tfrPh = ft_freqanalysis(cfg, data);

where chansAmpl = {'CH10'}, chansPh = {'CH17'}, freqAmpl = 3:5:98,
freqPh = 2:2:30. Simply computing the correlation of interest is done
in the following manner:

% extract freqs X samples matrices to compute correlations on
phSig = real(squeeze(tfrPh.fourierspctrm(1,match_str(tfrPh.label,
chansPh(k)), :, :)));
amplSig = abs(squeeze(tfr.fourierspctrm(1,match_str(tfr.label,
chansAmpl(l)), :, :)));

% compute correlations
r = corr(phSig', amplSig', 'rows', 'pairwise'); % use only non-NaN rows

The correlation matrix r I get out of this nicely corresponds to the
results I found by using algorithms implemented elsewhere: so far so
good. My (unsuccessful) attempts to use the statistics framework have
been variations on the following:

% add amplitude spectrum to the TFR (which now only has fourierspctrm)
tfr.amplspctrm = abs(tfr.fourierspctrm);

cfg = [];
cfg.method = 'montecarlo';
cfg.statistic = 'corr'; % I have made a (now empty) statfun_corr
cfg.parameter = 'amplspctrm';
cfg.numrandomization = 100;
cfg.design = real(tfrPh.fourierspctrm);
stat = ft_freqstatistics(cfg, tfr);

For your information, the tfr and cfg fields look like this before the
call to ft_freqstatistics:

cfg =
method: 'montecarlo'
statistic: 'corr'
parameter: 'amplspctrm'
numrandomization: 100
design: [4-D double]

tfr =
label: {'CH10'}
dimord: 'rpttap_chan_freq_time'
freq: [3 8 13 18 23 28 33 38 43 48 53 58 63 68 73 78 83 88 93 98]
time: [1x100000 double]
fourierspctrm: [4-D double]
cumtapcnt: [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
cfg: [1x1 struct]
amplspctrm: [4-D double]

The statfun_corr gets a dat variable, that appears to be a column
vector constructed by concatenating all the samples in the amplitude
spectrum. How should my function treat this? I am aware that
specifying a 4D design matrix is a bit odd, but it does contain my 15
regressors X 100000 samples (X 2 singleton dimensions that correspond
to the 1 trial and 1 channel that I analysed). If I squeeze() the
matrix before calling ft_freqstatistics, I get an error 'the size of
the design matrix does not match the number of observations in the
data'.

I realize my question is a bit vague, but I am somewhat lost as to how
to proceed. In short, what I want is this: (1) compute a correlation
(across samples) for each phase time series with each amplitude time
series (each time series corresponds to a single frequency); (2)
shuffle the phase time series around, computing correlations on the
shuffled data to assess significance of the observed correlation; (3)
prune away correlations that are isolated in frequency-frequency
space, retain only large clusters of correlations.

All the documentation about the cluster-based statistics is focused on
comparing variables that differ across trials and/or subjects, and my
'problem' might be that I only have a single trial. Chopping up the
data into 100000 trials, each 1 sample long, is maybe a possibility?
But it seems a bit artificial.

(It would actually probably not be very difficult to implement my
desired functionality 'by hand', but I guess it should be possble
within FT's statistics framework. That would have quite some