Reconstruction accuracy was quantified separately for each stimulus component by computing the correlation coefficient (Pearson's
r) between the reconstructed and original stimulus component. For each participant, this yielded 32 individual correlation coefficients for the 32 channel spectrogram model and 60 correlation coefficients for the 60 channel rate-scale modulation model (defined in Speech Stimuli). Overall reconstruction accuracy is reported as the mean correlation over all stimulus components.
To make a direct comparison of modulation and spectrogram-based accuracy, the reconstructions need to be compared in the same stimulus space. The linear spectrogram reconstruction was therefore projected into the rate-scale modulation space (using the modulation filterbank as described in Speech Stimuli). This transformation provides an estimate of the modulation content of the spectrogram reconstruction and allows direct comparison with the modulation reconstruction. The transformed reconstruction was then correlated with the 60 rate-scale components of the original stimulus. Accuracy as a function of rate (
Figure 5A) was calculated by averaging over the scale dimension. Positive and negative rates were also averaged unless otherwise shown. Comparison of reconstruction accuracy for a subset of data in the full rate-scale-frequency modulation space yielded similar results. To impose additivity and approximate a normal sampling distribution of the correlation coefficient statistic, Fisher's
z-transform was applied to correlation coefficients prior to tests of statistical significance and prior to averaging over stimulus channels and participants. The inverse z-transform was then applied for all reported mean
r values.
To visualize the modulation-based reconstruction in the spectrogram domain (
Figure 7B), the 4-D modulation representation needs to be inverted [18] (
link). If both magnitude and phase responses are available, the 2-D spectrogram can be restored by a linear inverse filtering operation [18] (
link). Here, only the magnitude response is reconstructed directly from neural activity. In this case, the spectrogram can be recovered approximately from the magnitude-only modulation representation using an iterative projection algorithm and an overcomplete set of modulation filters as described in Chi et al. [18] (
link).
Figure 7B displays the average of 100 random initializations of this algorithm. This approach is subject to non-neural errors due to the phase-retrieval problem (i.e., the algorithm does not perfectly recover the spectrogram, even when applied to the original stimulus) [18] (
link). Therefore, quantitative comparisons with the spectrogram-based reconstruction were performed in the modulation space.
Reconstruction accuracy was cross-validated and the reported correlation is the average over all resamples (see Cross-Validation) [53] (
link). Standard error is computed as the standard deviation of the resampled distribution [17] . The reported correlations are not corrected to account for the noise ceiling on prediction accuracy [16] (
link), which limits the amount of potentially explainable variance. An ideal model would not achieve perfect prediction accuracy of
r = 1.0 due to the presence of random noise that is unrelated to the stimulus. With repeated trials of identical stimuli, it is possible to estimate trial-to-trial variability to correct for the amount of potentially explainable variance [56] (
link). In the experiments reported here, a sufficient number of trial repetitions (>5) was generally unavailable for a robust estimate, and uncorrected values are therefore reported.