The 2024 Nobel Prize for Physics R. Ramanujam, Azim Premji University, Bangalore The 2024 Nobel Prize for Physics has been awarded to John J. Hopfield and Geoffrey Hinton "for foundational discoveries and inventions that enable machine learning with artificial neural networks". Hopfield is a 91 year old American physicist and professor of Princeton University, USA. Geoffrey Hinton is a 77 year old British-Canadian computer scientist and professor of University of Toronto, Canada. Machine learning with artificial neural networks? Can machines learn? Neural networks have to do with neurons and the human brain, don't they? Can we have artificial neurons? Does this mean that a machine built with artificial neurons can learn like human beings? What can they learn? These are natural questions that arise from simply hearing what the Nobel prize was given for. Another one, arising from curiosity, is what all this has to do with physics. The human brain Firstly, how do we human beings learn? The human brain routinely acquires a great deal of information. It manages to store a lot of information in memory. This helps the brain notice things, and thus get new information. Moreover, as we put in effort in learning new things, the information in memory is re-organized and updated. Thus the brain accumulates and processes information in three main ways: through noticing the world around, through memory, and through the effort it puts in learning new things. The human brain can also think, using the stored information for various purposes. Human and animal brains are made up of neurons. Two neurons have small gaps between them, called synapses. When a neuron *fires* it sends signals (called impulses) out through synapses to other neurons. Each synapse has a "weight factor" which determines how much of the signal from the firing neuron gets through the connection to the next neuron. The next neuron may have many incoming synapses just as it has many outgoing synapses. If the signals to a neuron from its incoming synapses total to a certain level called the threshold, then this neuron fires and in turn sends out signals to more neurons it is interconnected to. How does all this action of interconnected neurons result in memory? If the weights are adjusted in specific ways, encountering a particular signal may activate only a particular neuron and no other neuron. A popular example is this: when you see your grandmother, a specific "grandmother neuron" fires in your brain. What is interesting is that when you see someone who looks like your grandmother, not only that neuron but many other neurons fire, recalling your grandmother but not mistaking the other person for your grandmother. What is amazing is that it need not even be seeing someone; you are walking on a street and the smell of sambar cooking coming from some kitchen evokes memories of your grandmother. Here again a collection of neurons are firing, in a process we refer to as "associative memory". The human brain makes very deep associations of this kind. Neural networks Machine Learning is when people build a model of neurons and synapses in a computer. A neural network in a computer may have many layers upon layers of neurons. Each set of layers may be interconnected by large numbers of synapses. The idea of Artificial Neural networks (ANNs) is to build such neuronal networks: the basic unit is some simple processing node. An ANN consists of thousands or even millions of simple processing nodes that are densely interconnected. Most of today’s neural nets are organized into layers of nodes, and they are "feed-forward", meaning that data moves through them in only one direction. An individual node might be connected to several nodes in the layer beneath it, from which it receives data, and several nodes in the layer above it, to which it sends data. To each of its incoming connections, a node assigns a number known as a "weight". When the network is active, the node receives a different data item — a different number — over each of its connections and multiplies it by the associated weight. It then adds the resulting products together, yielding a single number. If that number is below a "threshold" value, the node does not pass any data to the next layer. If the number exceeds the threshold value, the node "fires", which means sending the number — the sum of the weighted inputs — along all its outgoing connections. How do you determine how to set the weights of all of the interconnecting synapses? Suppose that we are building an ANN for recognising images. The end goal is that when the input layer of neurons is shown a rectangular input of pixels from a camera, exactly one output neuron on the far side of the network will fire. Suppose that output indicates that the picture shows a child. Another neuron will fire indicating that there is a puppy in the picture, another that the child is standing next to a chair. But if I show the input layer a different picture, an output neuron may fire indicating the image has a child, and a puppy, but not any chair. How do all of the interconnecting synapse weights get set? By *training* the ANN. This is a technique called "back propagation". How does it work? You show the network an image. Then on the "output" neurons in training mode, you send signals back through from the output towards the input. You would send signals to the "output" neurons which indicate this picture has a child, a puppy and a chair. What you are doing is training the network on labeled images. You are choosing certain output neurons to be labels that represent things you would want to know about in an image. Each exposure to a labeled image (by back propagating the labels) gradually fine-tunes the weights within the network. It can work the other way as well. Run it backward. Put in signals to the outputs indicating that you want a picture of a child driving a car, and on the "input" side, you will get the pixels of such an image. Even though that image was never presented to the network before, it can "generate" such an image. This is basically how Artificial Intelligence (AI) programs can create nonexistent or "fake" images. History of ANNs Neural networks were first proposed in 1944 by Warren McCullough and Walter Pitts. They were a major area of research in both neuroscience and computer science until 1969. They enjoyed a resurgence in the 1980s, fell into eclipse again in the first decade of the new century, and has returned with a Big Bang in the second, fuelled largely by the increased processing power of graphics chips. (Computer chips are the core component of computers, made of microscopic electronic components, that process data. As the name indicates, graphics chips (GPUs) are pieces of fast processing hardware that render and display images, videos, and other visual aspects that usually involve a lot of memory and processing.) What McCullough and Pitts showed was that a neural net could, in principle, compute any function that a digital computer could. Their point was to suggest that the human brain could be thought of as a computing device. ANNs are important for research in neuroscience. For instance, particular network layouts or rules for adjusting weights and thresholds have been able to reproduce various observed features of human neurology and cognition. This indicates that they capture something about how the brain processes information. The first trainable neural network, the Perceptron, was demonstrated by Frank Rosenblatt in 1957. The Perceptron’s design had only one layer with adjustable weights and thresholds, sandwiched between input and output layers. Hopfield's work Imagine that you are trying to remember a song. You can hum a part of it, can even remember something about the meaning of the song, but can't recollect how the song begins. You search your memory. Suddenly some other word occurs to you that does not have to do with the song directly, and that triggers your memory and you start singing! This process is related to the associative memory that the physicist John Hopfield discovered in 1982. The Hopfield network can store patterns and has a method for recreating them. When the network is given an incomplete or slightly distorted pattern, the method can find the stored pattern that is most similar. How did Hopfield start thinking about this? He had previously used his background in physics to explore theoretical problems in molecular biology. When he was invited to a meeting on neuroscience he encountered research into the structure of the brain. He was fascinated by what he learned and started to think about the dynamics of simple neural networks. When neurons act together, they can give rise to new and powerful characteristics that are not apparent to someone who only looks at the network’s separate components. Hopfield particularly benefitted from having learned about magnetic materials that have special characteristics thanks to their atomic spin: a property that makes each atom a tiny magnet. These spins are usually randomly distributed in a material and can be pointing in any direction. But the spins of neighbouring atoms affect each other; this can allow regions to form in the material with spin in the same direction. The network that Hopfeld built has nodes that are all joined together via connections of different strengths. Each node can store an individual value: in Hopfeld’s frst work this could either be 0 or 1, like the pixels in a black and white picture. Hopfeld described the overall state of the network with a special property that is common in physics problems: it is equivalent to the total energy in the spin system. This energy is calculated using a formula that uses all the values of the nodes and all the strengths of the connections between them. The Hopfeld network is programmed by an image being fed to the nodes, which are given the value of black (0) or white (1). The network's connections are then adjusted using the energy formula, so that the saved image gets low energy. This is also a common approach in physics: many physical systems are such that they correspond to the lowest energy configuration: for instance, soap bubbles are spherical to minimise their surface energy. Now comes the "learning" component. When another pattern is fed into the network, it checks whether the network has lower energy if the value of any node is changed. If so, it changes the colour of that node/pixel. This procedure continues until it is impossible to find any further improvements. When this point is reached, the network has often reproduced the original image on which it was trained. The network has learned to identify the image. Amazing, isn't it? That neural networks could exhibit memory was a bold and powerful idea, and it laid the foundation for the use of ANNs in many areas later on. In comes Hinton Remembering an image is one thing, but interpreting what it depicts requires a little more. Even very young children can point at different animals and confidently say whether it is a dog, a cat, or a squirrel. They might get it wrong occasionally, but fairly soon they are correct almost all the time. A child can learn this even without seeing any diagrams or explanations of concepts such as species or mammals. After encountering a few examples of each type of animal, the different categories fall into place in the child’s head. People learn to recognise a cat, or understand a word, or enter a room and notice that something has changed, by experiencing the environment around them. When Hopfield published his article on associative memory, Geoffrey Hinton was working at Carnegie Mellon University in USA. He had previously studied experimental psychology and artificial intelligence in England and Scotland. He wondered whether machines could learn to process patterns in a similar way to humans. In particular, if they could find their own categories for sorting and interpreting information. Along with his colleague, Terrence Sejnowski, Hinton started from the Hopfeld network and expanded it to build something new, using ideas from statistical physics. The Boltzmann machine What is statistical physics? Statistics is the study of large data sets. For example, you may not be able to ask each individual person in India about their preferred choice of soap. But you can sample small parts of the population in different cities, small towns, and villages, and get a good idea. In physics, we know that a gas for instance, contains several thousand billion molecules. It is difficult (impossible!) to tell how an individual molecule will behave, but you can still find out some common features. For example, that water vapour gas will turn into water on cooling, and then to ice. There are different ways by which the water moelcules exhibit this collective behaviour. Each is called a state of the system. The probability of any of these states occurring can be analysed using statistical physics. Some states are more probable than others; this depends on the amount of available energy, which is described in an equation by the nineteenth-century physicist Ludwig Boltzmann. Hinton’s network utilised that equation, and the method was published in 1985 under the striking name of the *Boltzmann machine*. The Boltzmann machine can learn: not from instructions, but from being given examples. A trained Boltzmann machine can recognise familiar traits in information that it has not previously seen. Imagine meeting your friend’s brother or sister, and you can immediately see that they must be related. In a similar way, the Boltzmann machine can recognise an entirely new example if it belongs to a category found in the training material, and differentiate it from material that is dissimilar. Today’s ANNs are often enormous and constructed from many layers. These are called deep neural networks and the way they are trained is called *deep learning*. The next big breakthrough was the idea of *transformers*, which has led to today's generative Artificial Intelligence. Today's AI revolution Hinton and Hopfield’s work in the 1980's laid the foundation for today's AI revolution. Many successful applications were created in the 1990s. But it remained a challenge to train deep multilayered networks with many connections between consecutive layers. To many researchers in the field, training dense multilayered networks seemed out of reach. The situation changed in the 2000s. A leading figure in this breakthrough was Hinton, and an important tool was the restricted Boltzmann machine (RBM). Hinton and coworkers developed a pre-training procedure for multilayer networks, in which the layers are trained one by one using an RBM. By linking layers pre-trained in this way, Hinton was able to successfully implement examples of deep and dense networks, a milestone toward what is now known as deep learning. These networks are behind almost everything we do with AI today, such as image recognition, language generation, mammographic screening images, MRI scans, and much more. Machine learning in physics While physics has been a driving force underlying inventions and development of ANNs, conversely, ANNs are increasingly playing an important role as a powerful tool for modelling and analysis in almost all of physics. When we do not know a function, we can use Machine learning (ML) to approximate the function. Significant advances have been achieved in this way, for instance in the study of quantum mechanical many-body problems. They have helped in the prediction of new photovoltaic materials. ANNs have been used to discover new thermodynamical properties of water. They have made it possible to reach higher resolutions in explicit physics-based climate models. ANNs improved the sensitivity of searches for the Higgs boson at the CERN Large Electron Positron (LEP) collider during the 1990s. They were used in the analysis of data that led to its discovery at the CERN Large Hadron Collider in 2012. ANNs were also used in studies of the top quark at Fermilab. In astrophysics and astronomy, ANNs have become a standard data analysis tool. A recent example is an ANN-driven analysis of data from the IceCube neutrino detector at the South Pole, which resulted in seeing the Milky Way, not through radio or optical telescopes, but using particles called neutrinos. The Event Horizon Telescope image of the black hole at the centre of the Milky Way used ANNs for data processing. So far, the most spectacular scientific breakthrough using deep learning ANN methods is the AlphaFold tool for prediction of three-dimensional protein structures, given their amino acid sequences. This got the 2024 Nobel prize for Chemistry! See the article on the Nobel prize in Chemistry in the same issue of JM. Hinton's warnings Few Nobel laureates have expressed regret over the consequences of their own prize-winning work; none before they won the coveted prize. Geoffrey Hinton, often called the "Godfather of AI", did just that: in May 2023, he resigned from his advisory role at Google, to speak more freely about the "dangers" posed by AI. In an interview to the New York Times, he said that a part of him "regrets his life’s work". In Hinton's view, ANNs had suddenly become "a new and better form of intelligence". He thinks that it would not be too much of a leap to expect AI systems to soon take over control. Moreover, AI machines are able to almost instantly "teach" and transmit their entire knowledge to other connected machines — something that happens slowly (with errors) in the animal brain. He expressed concern that AI could fall into the "wrong hands". Sources: Several, including www.nobelprize.org