In this section we leave neural networks, and turn to theoretical analysis of differential correlations. We analyze information when there is a ‘pure’ ffT component and, just as importantly, when there is a not so pure component. We show that in the former case information saturates with N; in the latter case it doesn’t. We also show, somewhat surprisingly, that the optimal decoder doesn’t need to know about the ffT component of the correlations. In the Supplementary Modeling, we provide further insight into differential correlations by expressing them in terms of the eigenvectors and eigenvalues of the covariance matrix, and we use that analysis to understand why, and when, it’s hard to accurately estimate Fisher information.
Here we ask how the linear Fisher information scales with the number of neurons, N, when the covariance matrix contains a pure ffT component (the second term in equation (31)). Our starting point is a covariance matrix, Σ0(s), that doesn’t necessarily contain an ffT component. As in equation (3), the (linear) Fisher information associated with Σ0(s), denoted I0, is given by
I0=f(s)TΣ01(s)f(s)
where, as usual, f(s) is a vector of tuning curves,
f(s)(f1(s),f2(s),,fN(s))T
and a prime denotes a derivative with respect to s. Note that the information also depends on stimulus, s; we suppress that dependence for clarity. To add a pure ffT component, we define a new covariance matrix, Σε(s), via
Σε(s)=Σ0(s)+εf(s)fT(s)
The new information, denoted Iε, is given by
Iε=f(s)TΣε1(s)f(s)
To compute Iε, we need the inverse of Σε. As is easy to verify, this inverse is given by
Σε1(s)=Σ01(s)ε1+εI0Σ01(s)f(s)fT(s)Σ01(s)
Inserting equation (33) into (32), we arrive at
Iε=I0εI021+εI0=I01+εI0
which is equation (5).
Perhaps surprisingly, although ffT correlations have a critical role in determining information, they are irrelevant for decoding, in the sense that they have no effect on the locally optimal linear estimator. To see this explicitly, note first of all that the locally optimal linear estimator, denoted wT, generates an estimate of the stimulus near some particular value, s0, by linearly operating on neural activity,
ŝ=s0+wT(rf(s0))
In the presence of the covariance matrix given in equation (31), the optimal weight, woptT is given by
woptT=fT(Σ0+εffT)1fT(Σ0+εffT)1f
where we have dropped, for clarity, the explicit dependence on s0. Using equation (33), this reduces to
woptT=fTΣ01fTΣ01f
Thus, the locally optimal linear decoder does not need to know the size of the ffT correlations.
In hindsight this makes sense: ffT correlations shift the hill of activity, and there is, quite literally, nothing any decoder can do about this. This suggests that these correlations are in some sense special. To determine just how special, we ask what happens when we add correlations in a different direction—say correlations of the form uuT, where u is not parallel to f′. In that case, the covariance matrix becomes (with a normalization added for convenience only)
Σu(s)=Σ0(s)+εf(s)TΣ01(s)f(s)uTΣ01(s)uuuT
Repeating the steps leading to equation (34), we find that
Iuf(s)TΣu1(s)f(s)=I0sin2θ+I0cos2θ1+εI0
where I0 is defined in equation (29) and
cosθf(s)TΣ01(s)u[f(s)TΣ01(s)f(s)uTΣ01(s)u]1/2
Whenever θ ≠ 0—meaning u is not parallel to f′(s)—information does not saturate as N goes to infinity. Thus, in the large N limit, f′(s)f′(s)T correlations are the only ones that cause saturation.
A Supplementary Methods Checklist is available.