In this section we leave neural networks, and turn to theoretical analysis of differential correlations. We analyze information when there is a ‘pure’ f′f′T component and, just as importantly, when there is a not so pure component. We show that in the former case information saturates with N; in the latter case it doesn’t. We also show, somewhat surprisingly, that the optimal decoder doesn’t need to know about the f′f′T component of the correlations. In the Supplementary Modeling, we provide further insight into differential correlations by expressing them in terms of the eigenvectors and eigenvalues of the covariance matrix, and we use that analysis to understand why, and when, it’s hard to accurately estimate Fisher information.
Here we ask how the linear Fisher information scales with the number of neurons, N, when the covariance matrix contains a pure f′f′T component (the second term in equation (31)). Our starting point is a covariance matrix, Σ0(s), that doesn’t necessarily contain an f′f′T component. As in equation (3), the (linear) Fisher information associated with Σ0(s), denoted I0, is given by
where, as usual, f(s) is a vector of tuning curves,
and a prime denotes a derivative with respect to s. Note that the information also depends on stimulus, s; we suppress that dependence for clarity. To add a pure f′f′T component, we define a new covariance matrix, Σε(s), via
The new information, denoted Iε, is given by
To compute Iε, we need the inverse of Σε. As is easy to verify, this inverse is given by
Inserting equation (33) into (32), we arrive at
which is equation (5).
Perhaps surprisingly, although f′f′T correlations have a critical role in determining information, they are irrelevant for decoding, in the sense that they have no effect on the locally optimal linear estimator. To see this explicitly, note first of all that the locally optimal linear estimator, denoted wT, generates an estimate of the stimulus near some particular value, s0, by linearly operating on neural activity,
In the presence of the covariance matrix given in equation (31), the optimal weight, is given by
where we have dropped, for clarity, the explicit dependence on s0. Using equation (33), this reduces to
Thus, the locally optimal linear decoder does not need to know the size of the f′f′T correlations.
In hindsight this makes sense: f′f′T correlations shift the hill of activity, and there is, quite literally, nothing any decoder can do about this. This suggests that these correlations are in some sense special. To determine just how special, we ask what happens when we add correlations in a different direction—say correlations of the form uuT, where u is not parallel to f′. In that case, the covariance matrix becomes (with a normalization added for convenience only)
Repeating the steps leading to equation (34), we find that
where I0 is defined in equation (29) and
Whenever θ ≠ 0—meaning u is not parallel to f′(s)—information does not saturate as N goes to infinity. Thus, in the large N limit, f′(s)f′(s)T correlations are the only ones that cause saturation.
A Supplementary Methods Checklist is available.
Partial Protocol Preview
This section provides a glimpse into the protocol.
The remaining content is hidden due to licensing restrictions, but the full text is available at the following link:
Access Free Full Text.
Moreno-Bote R., Beck J., Kanitscheider I., Pitkow X., Latham P, & Pouget A. (2014). Information-limiting correlations. Nature neuroscience, 17(10), 1410-1417.