From magnets to the mind: How Hopfield and Hinton bridged physics to AI
December 22, 2024 | Sitabhra Sinha
Unlike the Nobel prizes in Peace and, to some extent, Literature, those for the sciences, namely, Physics, Chemistry and Physiology & Medicine, rarely give rise to much public discussion about their appropriateness. An exception seems to have occurred with this year’s Physics prize given to John Hopfield and Geoffrey Hinton for "for foundational discoveries and inventions that enable machine learning with artificial neural networks" (as the press statement released by the Nobel Prize committee says).
Almost everyone, whatever their expertise (or lack of it), seems to have an opinion – or at least questions – as to why the prize ostensibly meant for “the most important discovery or invention in the field of physics” - the exact wording in Alfred Nobel’s will instituting these prizes - should be given for “artificial intelligence” (as the media seems to have portrayed it). In fact, the spectacle reminds one strongly of the furore attending the award of the 2016 prize in Literature to the songwright Bob Dylan, that was accompanied by “raucous cheering, disbelieving emojis and graceless carping” (in the words of Amit Chaudhuri writing in The Guardian).
Indeed, the similarities seem to go deeper. In neither case, there was much disagreement about the importance of the body of work that was recognized. Dylan’s worldwide fame could hardly increase any further by awarding him a prize, even a Nobel – so the controversy was about whether his songs (or as some disparagingly put them, “popsongs”) can be classified as “literature”. Similarly, Hopfield and Hinton, while not a household name like Dylan, are well-known and respected figures in the international scientific community whose contributions have been long recognized by their peers, for instance, by awarding Hopfield the extremely prestigious Boltzmann medal in 2022, which is given for the most outstanding achievements in statistical physics. Earlier winners of the medal, like Kenneth Wilson and Giorgio Parisi, have also gone on to win the Nobel prize in physics subsequently. So, the puzzlement among the public (and even among some scientists who really ought to know better, one of whom made the ridiculous claim that it was given because Nobel hadn’t made any provision for prizes in computer science) seems to be primarily about how this work falls within the ambit of “physics”.
A possible answer is that, apart from the boundaries between physics and, say, engineering, being somewhat arbitrary (in any case, they did not exist as separate fields even a couple of centuries back), the prize seems to be a nod towards science finally breaking down the millenia-old division between physical (or “inanimate”) and living systems. We can trace the beginning of this enterprise – at least for our purpose – to the 1940s, when by one those curious coincidences of history, scientists began asking if the mathematical physics approach that had been so successful in revealing the subatomic world through the quantum revolution, could also reveal what is life, as well as, how the brain works. During these years, physicists such as Niels Bohr and Erwin Schrodinger were grappling with the question of the mechanism by which the cell encodes information that is inherited from one generation to the next – an initiative that within a few years would lead to the discovery of DNA by the physicist Francis Crick working with James Watson. Around the same time, in Chicago, a neurophysiologist Warren McCulloch working together with a gifted young mathematician Walter Pitts (who was so precocious that he had been invited at the age of 12 to study mathematical logic at Cambridge by Bertrand Russell) published a paper that viewed neurons as effectively just ON-OFF switches. By making the neurons flip from OFF to ON states or back on receiving signals from other neurons connected to them, McCulloch and Pitts showed that one can build logic circuits that can in principle do any complex operation one can think of.
Figure 1. A schematic diagram of a representative biological neuron (top) compared with the operating principle of the McCulloch-Pitts model of a neuron (bottom), which is essentially a binary switch (i.e., having only two states, viz., ON or active – denoted by 1, and OFF or inactive – denoted by 0). Just as the dendrites emanating from the neuronal cell body allow it to obtain information about the activation status of neighboring neurons (i.e., whether they are firing action potentials or not), with the signals from different neurons have differential influence in terms of their ability to evoke a response, the McCulloch-Pitts neuron receives as input the binary states (xi , i = 1, …, N, which can only take values 0 or 1) of all its N “pre-synaptic” neurons, that are then weighted by the corresponding synaptic strength (wi for neuron i). These inputs are then summed and compared against an intrinsic threshold . If the weighted sum exceeds the threshold, the neuron fires an action potential (i.e., it turns ON) and the signal is conveyed down the axon to its post-synaptic neurons; else it remains quiescent (OFF). Figure 2. Simple neuronal circuits can be constructed to implement various logical operations on input signals. As pointed out by McCulloch and Pitts, a network comprising 2 input neurons (whose activity states are denoted as a and b) that are stimulated – or not – by incoming stimuli, that provide synaptic connections to the output neurons x and y (left) can function as either an OR logic gate or an AND logic gate, by varying the connection weights and thresholds. The input-output diagram of each gate is indicated (center: OR, right: AND) with the axes representing the activity (low or high) of the input neurons and the symbols, dots (ON) and crosses (OFF) representing the state of any of the output neurons. For OR gate, high activity in any of the input neurons results in the output neuron switching to the ON state, while for the AND gate, only when both input neurons are highly activated, does the output neuron turn ON.Figure 3. By introducing an additional layer (so-called “hidden” because the neurons in this layer are neither directly accessible to external stimuli nor is their response immediately observable unlike the output layer) between the input and the output layers (left), the network becomes capable of implementing arbitrarily complex logical operations. In particular, it can implement XOR logic, whose input-output diagram is shown above (right), where the output neuron can switch to ON state (indicated by dots) if and only if only one of the input neurons is highly active. Otherwise, if both input neurons are either highly active or are inactive, the output shows no activity (crosses).
Immediately after the war the theoretical breakthrough of McCulloch and Pitts led to Frank Rosenblatt using this idea to come up with early machines that could recognize simple images – but the general problem of how to actually design the circuits required to implement any given problem remained unsolved. As it happened, the clue for how to do this was already there for anyone who knew where to look. In 1949, the neuropsychologist Donald Hebb came up with a deceptively simple hypothesis of how the brain “learns”, which can be stated simply as “neurons that fire together, wire together”. Thus, if two neurons consistently fire to do a particular task, they will connect more strongly with each other, thereby making it even more likely that in the future they will fire together – or in other words, when a network of artificial neurons are shown a series of patterns repeatedly, that corresponds to some of the neurons firing while others remain silent, they will “learn” to settle to such patterns by altering their mutual connections – thereby “memorizing” the patterns shown.
In fact, in 1974, William Little, a member of the Physics Department at Stanford University, published a model where he showed how implementing Hebb’s rule can generate short-term and long-term memory in a network of switch-like neurons. The stage was now set for John J Hopfield’s landmark paper of 1982 in which he went beyond Little’s work by deviating a little from biological realism – something the physicist Daniel Amit termed “a brilliant step backwards”. Specifically, Hopfield assumed that if a neuron A is connected to neuron B with some strength W, B is also connected to A with the same strength W. This simple assumption allows one to associate an energy with each brain state, defined as a particular configuration of neurons that are on (“firing”) while others are off (“quiescent”). To give an example, consider swinging a pendulum held in your hand. If you give it an initial jerk, you are supplying some energy which makes it swing from side to side – thus the initial state has high energy. If you now hold your hand still, over time the forces of friction will make the energy dissipate into the surroundings and the pendulum will gradually come to rest in a vertical orientation. Similarly, in the neural network model of Hopfield, the arbitrary state from which the “brain” starts is at a high energy state that over time will gradually come to rest in the closest available minimum-energy state that is actually the memory learnt by Hebb’s rule. Hopfield further showed that by mapping the language of ON/OFF neurons to tiny magnetic spins that are oriented either UP or DOWN, this description of the brain is essentially equivalent to random magnets (technically called “spin glasses”) that physicists have been studying for some time.
Figure 4. The neural network model for human-like associative memory proposed by Hopfield in 1982 assumes that every neuron is connected to every other in the network of N neurons being considered – schematically shown for N=5 (left top) and N=3 (left bottom). The connections between neurons are undirected as Hopfield’s learning rule (based on Hebb’s principle) ensures that the connection weights are symmetric – i.e., if neuron i connects to neuron j with synaptic strength Wij, and neuron j connects to neuron i with synaptic strength Wji, then Wij = Wji for all i ≠ j. Each state of the network can be defined as a sequence of the spin orientations Si that represent the ON/OFF status of each neuron (i=1,…, N) in the network, i.e., {S1, S2, … SN}. It’s easy to see that there are a total of 2N possible network states, from {–1, –1, …, –1} to {1, 1, …,1}. The dynamics, i.e., time-evolution from one network state to another, is shown for a N=3 Hopfield network. The synaptic strengths, calculated according to Hopfield’s learning rule, ensure that the state ξ={1,1,–1} is an attractor – i.e., the network has memorized a single pattern ξ (right). Small variations from this state will flow back to the state as shown above. Larger variations will flow to the complementary state ξ’= –ξ={–1, –1,1} obtained by “flipping” all the spin states in ξ and which, therefore, is the other attractor of the network dynamics. The colored plane separates the basins of attraction of the two attractors.
The advantage of transforming the problem of neural networks into one of random magnets was that one could now think of an energy landscape for the brain with the memory states corresponding to the valleys (low energy states) that were separated from all other memory states by high energy “ridges”, such that each memory had a basin of attraction around it that corresponded all sensory stimuli that eventually resulted in recalling that memory. Thus, Hopfield gave a very simple physical explanation for associative memory – the process by which when we, for example, smell freshly baked bread, it can give rise to a Proustian chain of thoughts that make us remember an event in our childhood when we used walk past the local bakery in the arms of our mother (say). As shown by the Hopfield model, this is no more or less than the brain state starting initially in a high energy state (smelling the bread) gradually flowing down to the neighbouring lowest energy state (the memory of our childhood). Connecting memory storage and recall in the brain to energy landscapes, almost overnight made physicists working in statistical physics enter brain science in droves – a process that was dubbed the “Hopfield revolution” – where they solved problems that previously couldn't even be quantified, like how does memory capacity change with network size (measured in terms of the number of neurons N comprising it).
Figure 5. For a Hopfield network with a single stored pattern, there are only two attractors (as seen in Figure 4) which also happen to be the energy minima for the system seen as a globally connected network of binary state spins (i.e., a spin can at any time only be in one of two possible orientations, viz., 1 or –1 – such spins are often referred to as Ising spins in the physics literature). This can be shown in terms of an analogy with a landscape comprising two deep valleys separated by ridges (left), the state of the system corresponding to the white ball that is rolling downhill in whichever valley it is located in. The energy-landscape description of the model network is thus identical to a ferromagnetic system – e.g., a magnetized piece of iron – under a suitable mapping of spin states (known technically as a gauge transformation). If more than one pattern is stored as a memory, then other attractors appear – represented as valleys in the contour diagram (right) that shows an analogy with a physical landscape. The system state will converge to the nearest energy minimum, shown in the analogy through arrows that represent flow of a ball down to a valley bottom starting from different initial conditions, the state at which the system ends up varying depending on which basin of attraction the initial state of the system was located.
Hopfield’s model however had severe limitations in terms of how closely it followed the biological architecture of the brain. In particular, the brain has a clearly hierarchical structure, as sensory neurons respond to external stimuli, which then activate layers of interneurons one after another, before eventually causing motor neurons to activate muscles. Geoffrey Hinton working with collaborators such as Terrence Sejnowski, realized that not all neurons need be involved in the reception of stimuli and/or readout of memory. Thus, by dividing the neurons of the network into “visible” (whose states encode the memorized pattern) and “hidden” (which perform computations but are not directly accessible for input or output), and by allowing probabilistic evolution of the states (to take into account the inherent fluctuations in the biological process going on inside the brain) Hinton came up with the Boltzmann machine model named after Ludwig Boltzmann, the founder of statistical physics.
Subsequently, Hinton combined a layered architecture, where connections exist only between successive layers (inspired by the hierarchical structure of brains) with the energy landscape framework of Hopfield, to develop Restricted Boltzmann Machines. The rest as they say is history, as it allowed networks with arbitrary numbers of layers to be trained by altering their connection weights to perform any given function. The resulting birth of the deep learning paradigm is – as we are all well aware – radically transforming society through the AI revolution. But just as a journey of a thousand miles begins with a single step, the impressive advances in machine learning owes its existence to Hopfield stepping back a bit from the biology to simplify the problem of learning and recall, that allowed him to bring to bear the well-developed machinery of statistical physics of spin models and random magnets and thus bridge the apparently widely separated fields of neuroscience, physics and machine learning. More broadly, Hopfield and Hinton’s work raises the question whether living systems are really in any way significantly different from physical systems or whether this perceived distinction is just a holdover from a belief in existence of some special “vital force” being responsible for life and consciousness. Would we in our lifetime see the first sentient AI emerging, possibly uttering the words “I compute, therefore I am”?
This is a revised, augmented version of an article by the author that appeared in Deccan Herald newspaper on October 16, 2024.