Consider a network comprising two groups of nodes, v and h, with connections between each member of v and each member of h and no connections within v or h. The energy of this restricted Boltzmann machine model is E(v,h)=-ibivvi-jbjhj-i,jviwijhj and the joint probability is given by p(v,h)=exp(-E(v,h))vhexp(-E(v,h)).    
The resulting bipartite structure gives rise to analytic expressions for the conditional probabilities: the probability that h is on given v and the probability that v is on given h. Consequently, the conditional distribution p(h|v) is simple to compute, see for example16 for the derivation of the expression p(hj=1|v)=σ(bj+(vTW)j) for σ defined in Eq. (8).
As the first step in transfer of f1 to the QPU, we assign N qubits as input nodes v and M qubits as output nodes h. For annealing, the known values of v will be realised by setting the strength of the local field biases, bv , so that the v are effectively clamped on or off as appropriate. The local field biases of h are set to b and the coupling strengths between v and h are set to W with coefficients wij , from Eq. (4). Mapping these nodes ( vi , hj ) and coefficients ( bi , cj , wij ) to the QPU, and using quantum annealing to obtain samples, is equivalent to sampling a Bernouilli random variable from a suitably defined sigmoid distribution. In summary we use this equivalence to transfer weights from either a classically trained sigmoid activation layer within a neural network or a RBM to the appropriate number of qubits and associated parameter values. We then run quantum annealing and take samples. These samples correspond to low energy solutions.
As outlined above the classical samples come from Eq. (4). However the quantum samples arise from a probability distribution modified by a temperature coefficient to be estimated from the data19 . We address this issue by introducing a parameter S and evaluate its sensitivity on the results. The purpose of this parameter is to align the classical and quantum Boltzmann distribution according to f1(v)=σ(S(Wv+b)).    
The classical neural network is then trained by using an adapted sigmoid layer with activation σ(Sv) and adjusting the weights that are transferred to the QPU to SW.
Free full text: Click here