The validation of autoencoders was made by comparing the different variants of autoencoders with PCA, ICA and Isomap. The chosen datasets, 95 in number and denominated as simulations, originate from the Department of Engineering, University of Leicester UK and are publicly available. Each simulation is a dataset. The creation of these simulations was based on recordings from the neocortex of a monkey. They were generated using 594 different spike shapes [45 (link)]. The original study that introduces the simulations [45 (link)] also reviews different clustering algorithms and their results. Out of 20 different units, these algorithms were able to detect 10 in the best case.
The datasets were generated based on a real dataset recorded “in vivo”. The waveform contains 316 points originally sampled at 96 KHz; afterwards this frequency was reduced to 24KHz, therefore 79 samples describe a spike. Being synthetic datasets, each of these spikes has a label, which allows for the use of external metrics to evaluate performance. Each simulation contains a multi-unit cluster, which is the noise, and a number of clusters that varies between 2 and 20. Each unique number of clusters has 5 simulations. Thus, there are 5 simulations with 2 clusters, 5 simulations with 3 clusters, and so on.
All but one of the clusters are single-units between 0 and 50μm away from the electrode. The firing rate follows a Poisson distribution with a mean between 0.1 and 2Hz. The amplitudes follow a normal distribution and have been scaled to values between 0.9 and 2 to simulate real data. No spikes with temporal overlapping are present in the data, such that spikes have at least 0.3ms between them.
The generated multi-unit cluster was added in order to increase the complexity of clustering for the tested algorithms. The simulated multi-unit contains 20 spike shapes, each of the 20 neurons firing being between 50–140μm away from the electrode. The amplitude of the spikes was fixed to 0.5, with an overall composite firing rate of 5Hz, with each of the 20 individual composing neurons having a firing rate mean of 0.25Hz following an independent Poisson distribution. Here, in order to increase clarity, the multi-unit cluster is always color-coded in white in all figures.
To evaluate the proposed approach in comparison with other state-of-the-art methods we have chosen the following 4 simulations out of the 95 available as they are representative of the issues that are present in feature extraction methods and allow for the evaluation of the methods on varying numbers of clusters covering a wide range and enabling a comprehensive evaluation of performance:
These simulations can also be viewed in Fig 3 through the use of PCA to reduce the dimensionality from 79 to 2. The overlapping clusters produced by PCA can be clearly seen in Fig 3, in none of the datasets is it able to perfectly separate all clusters.
Free full text: Click here