Picture by By Tatiana Shepeleva/shutterstock.comOne of the most challenging problems in modern theoretical physics is the so-called many-body problem. Typical many-body systems are composed of a large number of strongly interacting particles. Few such systems are amenable to exact mathematical treatment and numerical techniques are needed to make progress. However, since the resources required to specify a generic many-body quantum state depend exponentially on the number of particles in the system (more precisely, on the number of degrees of freedom), even today’s best supercomputers lack sufficient power to exactly encode such states (they can handle only relatively small systems, with less than ~45 particles).As we shall see, recent applications of machine learning techniques (artificial neural networks in particular) have been shown to provide highly efficient representations of such complex states, making their overwhelming complexity computationally tractable.In this article, I will discuss how to apply (a type of) artificial neural network to represent quantum states of many particles. The article will be divided into three parts:A bird’s-eye view of fundamental quantum mechanical concepts.A brief description of machine learning concepts with a particular focus on a type of artificial neural network known as Restricted Boltzmann Machine (RBM)An explanation of how one can use RBMs to represent many-particle quantum states.A PreambleThere is a fascinating story recounted by one of Albert Einstein's scientific collaborators, the Polish physicist Leopold Infeld, in his autobiography. Einstein and Infeld in Einstein’s home (source).According to Infeld, after the two physicists spent several months performing long and grueling calculations, Einstein would make the following remark:“God [Nature] does not care about our mathematical difficulties. He integrates empirically.”— Einstein (1942).What Einstein meant was that, while humans must resort to complex calculations and symbolic reasoning to solve complicated physics problems, Nature does not need to.Quick Note: Einstein used the term “integrate” here because many physical theories are formulated using equations called “differential equations” and to find solutions of such equations one must apply the process of “integration”.The Many-Body ProblemAs noted in the introduction, a notoriously difficult problem in theoretical physics is the many-body problem. This problem has been investigated for a very long time in both classical systems (physical systems based on Newton's three laws of motion and its refinements) and quantum systems (systems based obeying quantum mechanical laws).The first (classical) many-body problem to be extensively studied was the 3-body problem involving the Earth, the Moon, and the Sun. A simple orbit of a 3-body system with equal masses.One of the first scientists to attack this many-body problem was none other than Issac Newton in his masterpiece, the Principia Mathematica:“Each time a planet revolves it traces a fresh orbit […] and each orbit is dependent upon the combined motions of all the planets, not to mention their actions upon each other […]. Unless I am much mistaken, it would exceed the force of human wit to consider so many causes of motion at the same time, and to define the motions by exact laws which would allow of an easy calculation.”— Isaac Newton (1687) Newton’s Principia Mathematica, arguably the most important scientific book in history.Since essentially all relevant physical systems are composed by a collection of interacting particles, the many-body problem is extremely important.A Poor Man’s DefinitionOne can define the problem as “the study of the effects of interactions between bodies on the behavior of a many-body system ”. Collisions of gold ions generate a quark-gluon plasma, a typical many-body system.The meaning of “many” in this context can be anywhere from three to infinity. In a recent paper, my colleagues and I showed that the signatures of quantum many-body behavior can be found already for N=5 spin excitations (figure below). The density of states of a type of spin system (XX model). As the number of spin excitations increases from 2 to 5, a Gaussian distribution (typical of many-body systems with 2-body couplings) is approached.In the present article, I will focus on the quantum many-body problem which has been my main topic of research since 2013.Quantum Many-Body SystemsThe complexity of quantum many-body systems was identified by physicists already in the 1930s. Around that time, the great physicist Paul Dirac envisioned two major problems in quantum mechanics. The English physicist Paul Dirac.The first, according to him, was “in connection with the exact fitting in of the theory with relativity ideas”. The second was that “the exact application of these [quantum] laws leads to equations much too complicated to be soluble”. The second problem was precisely the quantum many-body problem.Luckily, the quantum states of many physical systems can be described using much less information than the maximum capacity of their Hilbert spaces. This fact is exploited by several numerical techniques including the well-known Quantum Monte Carlo (QMC) method.Quantum Wave FunctionsSimply put, a quantum wave function describes mathematically the state of a quantum system. The first quantum system to receive an exact mathematical treatment was the hydrogen atom. The probability of finding the electron in a hydrogen atom (represented by the brightness).In general, a quantum state is represented by a complex probability amplitude Ψ(S), where the argument S contains all the information about the system’s state. For example, in a spin-1/2 chain: A 1D spin chain: each particle has a value for σ in the z-axis.From Ψ(S), probabilities associated with measurements made on the system can be derived. For example, the square modulus of Ψ(S), a positive real number, gives the probability distribution associated with Ψ(S):The Hamiltonian OperatorThe properties of a quantum system are encapsulated by the system’s Hamiltonian operator H. The latter is the sum of two terms:The kinetic energy of all particles in the system and it is associated with their motionThe potential energy of all particles in the system, associated with the position of the particles with respect to other particles.The allowed energy levels of a quantum system (its energy spectrum) can be obtained by solving the so-called Schrodinger equation, a partial differential equation that describes the behavior of quantum mechanical systems. The Austrian physicist Erwin Schrodinger, one of the fathers of quantum mechanics .The time-independent version of the Schrödinger equation is given by the following eigenvalue system:The eigenvalues and the corresponding eigenstates areThe lowest energy corresponds to the so-called “ground state” of the system.A Simple ExampleFor concreteness, let us consider the following example: the quantum harmonic oscillator. The QHO is the quantum-mechanical counterpart of the classical harmonic oscillator (see the figure below), which is a system that experiences a force when displaced from its initial that restores it to its equilibrium position. A mass-spring harmonic oscillator The animation below compares the classical and quantum conceptions of a simple harmonic oscillator. Wave function describing a quantum harmonic oscillator (Wiki).While a simple oscillating mass in a well-defined trajectory represents the classical system (blocks A and B in the figure above), the corresponding quantum system is represented by a complex wave function. In each block (from C onwards) there are two curves: the blue one is the real part of Ψ, and the red one is the imaginary part.Bird’s-eye View of Quantum Spin SystemsIn quantum mechanics, spin can be roughly understood as an “intrinsic form of angular momentum” that is carried by particles and nuclei. Though it is intuitive to think of spin as a rotation of a particle around its own axis this picture is not quite correct since then the particle would rotate at a faster than light speed which would violate fundamental physical principles. If fact spins are quantum mechanical objects without classical counterpart. Example of a many-body system: a spin impurity propagating through a chain of atoms Quantum spin systems are closely associated with the phenomena of magnetism. Magnets are made of atoms, which are often small magnets. When these atomic magnets become parallelly oriented they give origin to the macroscopic effect we are familiar with. Magnetic materials often display spin waves, propagating disturbances in the magnetic order.I will now provide a quick summary of the basic components of machine learning algorithms in a way that will be helpful for the reader to understand their connections with quantum systems.Machine Learning = Machine + LearningMachine learning approaches have two basic components (Carleo, 2017):The machine, which could be e.g. an artificial neural network Ψ with parametersThe learning of the parameters W, performed using e.g. stochastic optimization algorithms. The two components of machine learning.Neural networksArtificial neural networks are usually non-linear multi-dimensional nested functions. Their internal workings are only heuristically understood and investigating their structure does not generate insights regarding the function being it approximates. Simple artificial neural network with two hidden layers.Due to the absence of a clear-cut connection between the network parameters and the mathematical function which is being approximated, ANNs are often referred to as “black boxes”.What are Restricted Boltzmann Machines?Restricted Boltzmann Machines are generative stochastic neural networks. They have many applications including:Collaborative filteringDimensionality reductionClassificationRegressionFeature learningTopic modelingRBMs belong to a class of models known as Energy-based Models. They are different from other (more popular) neural networks which estimate a value based on inputs while RBMs estimate probability densities of the inputs (they estimate many points instead of a single value).RBMs have the following properties:They are shallow networks, with only two layers (the input/visible layer and a hidden layer)Their hidden units h and visible (input) units v are usually binary-valuedThere is a weight matrix W associated with the connections between hidden and visible unitsThere are two bias terms, one for input units denoted by a and one for hidden units denoted by bEach configuration has an associated energy functional E(v,h) which is minimized during trainingThey have no output layerThere are no intra-layer connections (this is the “restriction”). For a given set of visible unit activations, the hidden unit activations are mutually independent (the converse also holds). This property facilitates the analysis tremendously.The energy functional to be minimized is given by: Eq.1: Energy functional minimized by RBMs.The joint probability distribution of both visible and hidden units reads: Eq.2: Total probability distribution.where the normalization constant Z is called the partition function. Tracing out the hidden units, we obtain the marginal probability of a visible (input) vector: Eq.3: Input units marginal probability distribution,Since, as noted before, hidden (visible) unit activations are mutually independent given the visible (hidden) unit activations one can write: Eq.4: Conditional probabilities becomes products due to mutual independence.and also: Eq. 5: Same as Eq.4.Finally, the activation probabilities read: Eq.6: Activation probabilities.where σ is the sigmoid function.The training steps are the following:We begin by setting the visible units states to a training vector.The states of the hidden units are then calculated using the expression on the left of Equation 6.After the states are chosen for the hidden units, one performs the so-called “reconstruction”, setting each visible unit to 1 according to the expression on the right of Equation 6.The weight changes by (the primed variables are the reconstructions):How RBMs process inputs, a simple exampleThe following analysis is heavily based on this excellent tutorial. The three figures below show how a RBM processes inputs. A simple RBM processing inputs.At node 1 of the hidden layer, the input x is multiplied by the weight w, a bias b is added, and the result is fed into the activation giving origin to an output a (see the leftmost diagram).In the central diagram, all inputs are combined at the hidden node 1 and each input x is multiplied by its corresponding w. The products are then summed, a bias b is added, and the end result is passed into an activation function producing the full output a from the hidden node 1In the third diagram, inputs x are passed to all nodes in the hidden layer. At each hidden node, x is multiplied by its corresponding weight w. Individual hidden nodes receive products of all inputs x with their individual weights w. The bias b is then added to each sum, and the results are passed through activation functions generating outputs for all hidden nodes.How RBMs learn to reconstruct dataRBMs perform an unsupervised process called “reconstruction”. They learn to reconstruct the data performing a long succession of passes (forward and backward ones) between its two layers. In the backward pass, as shown in the diagram below, the activation functions of the nodes in the hidden layer become the new inputs.The product of these inputs and the respective weights are summed and the new biases b from the visible layer are added at each input node. The new output from such operations is called “reconstruction” because it is an approximation of the original input.Naturally, the reconstructions and the original inputs are very different at first (since the values of w are randomly initialized). However, as the error is repeatedly backpropagated against the ws, it is gradually minimized.We see therefore that:The RBM uses, on the forward pass, inputs to make predictions about the activations of the nodes and estimate the probability distribution of the output a conditional on the weighted inputs xOn the backward pass, the RBM tries to estimate the probability distribution of the inputs x conditional on the activations aJoining both conditional distributions, the joint probability distribution of x and a is obtained i.e. the RBM learns how to approximate the original data (the structure of the input).How to connect machine learning and quantum systems?In a recent article published in Science magazine, it was proposed that one can treat the quantum wave function Ψ(S) of a quantum many-body system as a black-box and then approximate it using an RBM. The RBM is trained to represent Ψ(S) via the optimization of its parameters. RBM used by Carleo and Troyer (2017) that encodes a spin many-body quantum state.The question is how to reformulate the (time-independent) Schrodinger equation, which is an eigenvalue problem, as a machine learning problem.Variational MethodsAs it turns out, the answer has been known for quite some time, and it is based on the so-called variation method , an alternative formulation of the wave equation that can be used to obtain the energies of a quantum system. Using this method we can write the optimization problem as follows:where E[Ψ] is a functional that depends on the eigenstates and Hamiltonian. Solving this optimization problem we obtain both the ground state energy and its corresponding ground state.Quantum States and Restricted Boltzmann MachinesIn Carleo and Troyer (2017), RBMs are used to represent a quantum state Ψ(S). They generalize RBMs to allow for complex network parameters.It is easy to show that the energy functional can be written aswhere the argument of the expectation value after the last equal sign is the local energy. The neural network is then trained using the method of Stochastic Reconfiguation (SR). The corresponding optimization iteration reads: The gradient descent update protocol.where η is the learning rate and S is the stochastic reconfiguration matrix which depends on the eigenstates and its logarithmic derivatives. Carleo and Troyer (2017)were interested specifically in quantum systems of spin 1/2 and they write the quantum state as follows:In this expression the W argument of Ψ is the set of parameters:where the components on a and b are real but W can be complex. The absence of intralayer interactions, typical of the RBMs architecture allows hidden variables to be summer over (or traced out), considerably simplifying the expression above to:To train the quantum wave functions one follows a similar procedure as described above for RBMs.Impressive AccuracyThe figure below shows the negligible relative error of the NQS ground-state energy estimation. Each plot corresponds to a test case which is a system with known exact solutions. The horizontal axis is the hidden-unit density i.e. the ratio between the number of hidden and visible units. Notice that even with relatively few hidden units, the accuracy of the model is already extremely impressive (one part per million error!) The error of the model ground-state energy relative to the exact value in three tests cases.ConclusionIn this brief article, we saw that Restricted Boltzmann Machines (RBMs), a simple type of artificial neural network, can be used to compute with extremely high accuracy the ground-state energy of quantum systems of many particles. Thanks for reading!As always, constructive criticism and feedback are welcome!This article was originally published on here.See More

]]>

Nowadays, artificial intelligence is present in almost every part of our lives. Smartphones, social media feeds, recommendation engines, online ad networks, and navigation tools are examples of AI-based applications that affect us on a daily basis.Deep learning has been systematically improving the state of the art in areas such as speech recognition, autonomous driving, machine translation, and visual object recognition. However, the reasons why deep learning works so spectacularly well are not yet fully understood.Hints from MathematicsPaul Dirac, one of the fathers of quantum mechanics and arguably the greatest English physicist since Sir Isaac Newton, once remarked that progress in physics using the “method of mathematical reason” would“…enable[s] one to infer results about experiments that have not been performed. There is no logical reason why the […] method should be possible at all, but one has found in practice that it does work and meets with reasonable success. This must be ascribed to some mathematical quality in Nature, a quality which the casual observer of Nature would not suspect, but which nevertheless plays an important role in Nature’s scheme.”— Paul Dirac, 1939 Portrait of Paul Dirac is at the peak of his powers (Wikimedia Commons).There are many examples in history where purely abstract mathematical concepts eventually led to powerful applications way beyond the context in which they were developed. This article is about one of those examples.Though I’ve been working with machine learning for a few years now, I’m a theoretical physicist by training, and I have a soft spot for pure mathematics. Lately, I have been particularly interested in the connections between deep learning, pure mathematics, and physics.This article provides examples of powerful techniques from a branch of mathematics called mathematical analysis. My goal is to use rigorous mathematical results to try to “justify”, at least in some respects, why deep learning methods work so surprisingly well. Abstract representation of a neural network (source).A Beautiful TheoremIn this section, I will argue that one of the reasons why artificial neural networks are so powerful is intimately related to the mathematical form of the output of its neurons. A manuscript by Albert Einstein (source).I will justify this bold claim using a celebrated theorem originally proved by two Russian mathematicians in the late 50s, the so-called Kolmogorov-Arnold representation theorem. The mathematicians Andrei Kolmogorov (left) and Vladimir Arnold (right).Hilbert’s 13th problemIn 1900, David Hilbert, one of the most influential mathematicians of the 20th century, presented a famous collection of problems that effectively set the course of the 20th-century mathematics research.The Kolmogorov–Arnold representation theorem is related to one of the celebrated Hilbert problems, all of which hugely influenced 20th-century mathematics.Closing in on the connection with neural networksA generalization of one of these problems, the 13th problem specifically, considers the possibility that a function of n variables can be expressed as a combination of sums and compositions of just two functions of a single variable which are denoted by Φ and ϕ.More concretely: Kolmogorov-Arnold representation theoremHere, η and the λs are real numbers. It should be noted that these two univariate functions are Φ and ϕ can have a highly complicated (fractal) structure.Three articles, by Kolmogorov (1957), Arnold (1958) and Sprecher (1965) provided a proof that there must exist such representation. This result is rather unexpected since according to it, the bewildering complexity of multivariate functions can be “translated” into trivial operations of univariate functions, such as additions and function compositions.Now what?If you got this far (and I would be thrilled if you did), you are probably wondering: how could an esoteric theorem from the 50s and 60s be even remotely related to cutting-edge algorithms such as artificial neural networks?A Quick Reminder of Neural Networks ActivationsThe expressions computed at each node of a neural network are compositions of other functions, in this case, the so-called activation functions. The degree of complexity of such compositions depends on the depth of the hidden layer containing the node. For example, a node in the second hidden layer performs the following computation: Computation performed by the k-th hidden unit in the second hidden layer.Where the ws are the weights, and the bs are the biases. The similarity with the multivariate function f shown a few paragraphs above is evident!Let us quickly write down a function in Python only for forward-propagation which outputs the calculations performed by the neurons. The code for the function below has the following steps:First line: the first activation function ϕ acts on the first linear step given by:x0.dot(w1) + b1where x0 is the input vector.Second line: the second activation function acts on the second linear stepy1.dot(w2) + b2Third line: a softmax function is used in the final layer of the neural network, acting on the third linear stepy2.dot(w3) + b3The full function is:def forward_propagation(w1, b1, w2, b2, w3, b3, x0):y1 = phi(x0.dot(w1) + b1)y2 = phi(y1.dot(w2) + b2)y3 = softmax(y2.dot(w3) + b3)return y1, y2, y3To compare this with our expression above we write:y2 = phi(phi(x0.dot(w1) + b1).dot(w2) + b2)The correspondence can be made more clear:A Connection Between Two WorldsWe, therefore, conclude that the result proved by Kolmogorov, Arnold, and Sprecher implies that neural networks, whose output is nothing but the repeated composition of functions, are extremely powerful objects, which can represent any multivariate function or equivalently almost any process in nature. This partly explains why neural networks work so well in so many fields. In other words, the generalization power of neural networks is, at least in part, a consequence of the Kolmogorov-Arnold representation theorem.As pointed out by Giuseppe Carleo, the generalization power of forming functions of functions of functions ad nauseam was, in a way, “discovered independently also by nature” since neural networks, which work as shown above doing precisely that, are a simplified way to describe how our brains work.Thanks a lot for reading! Constructive criticism and feedback are always welcome!There is a lot more to come, stay tuned!Originally posted here.See More

]]>

]]>

]]>

]]>

]]>

]]>

]]>