The keyword in “Artificial Neural Network” (ANN) is network. ANN were introduced in the 1940’s, with the idea that simple mathematical functions, called perceptors at the time, when interconnected could do what computers are now starting to do –generalize concepts. If you think about the history of computing, it was not right moment for thinking about “distributed computing”. The von Neumann architecture (the one we have been using till now, CPU (Central Processor Unit), RAM etc) is a serial model of computing. Which means that networks of small computing functions, the perceptors, are hard to implement: given a number of inputs to a set of perceptors, the machine will evaluate one preceptor, then another one, then compute how each of these perceptors influence other perceptors, evaluate these, how these influence the others, etcetera.
But the pioneers of computing networks had a point (of course). Think of an ant colony. A single ant is not smart, but thousands of them are –only if they can communicate. Communication –what networks needs to be successful, what networks actually are– is also why networks of perceptors were hardly the right solution at the dawn of computer science. At the time, having small computing units communicating between each other was out of question. Much better to create a single “decently smart ant”, the CPU, and leave communication for the future development. Communication remains, nonetheless, the key. If you want to create something as good as our brain you need billions of computing elements communicating among each other.
That is now possible thanks to GPUs (Graphics Processor Units). GPUs, opposite to the CPU, take grids of numbers and perform computations on all of them at the same time. Imagine having two grids of 1,000 x 1,000 elements, and wanting to multiply each element of one grid by the corresponding element of the second (the so-called Hadamard product). This makes 1 million operation. Well, GPUs can glob the 2 million elements and perform all the multiplications at the same time.
Why has this been a breakthrough for ANN? Because the mathematical representation of networks are grids, which scientists call matrices. The representation is quite simple: the n-th number of the m-th row indicates the weight of the connection between the n-th element and the m-th element in the network. A “0” means no connection, a high number means the receiving elements “feels” strongly what the sender sends.
How Neural Networks work?
This is how, simplifying a lot, neural networks work. The inputs are sent to a set of perceptors in the networks –which are now more complex than the ones introduced in the 1950s and are aptly called neurons. These neurons apply a weight on each of the input, effectively creating a connection with all the input channel. The output is “massaged” and sent to a new set of neurons. These do the same (weight, number, massage) and send the results to a new set and so on, until we find a small set of neurons each of which represent a certain category of the inputs. An example: take the pixel of a picture as input, and for all picture of trees have the same neuron in the last set giving a big signal, and all the others a small one.
The whole point is finding the values of the numbers weighing the connections so that the networks always categorize in the correct way. Which, in a mathematical lingo, means “find the right topology of the network”. Or “the way the elements are connected”.
In the next post we’ll see how nature and computers found that certain topologies are favoured –independently if the network is a network of mathematical functions (the perceptors/neurons) or of biological neurons, the way the elements connect is similar…
(1945) John von Neumann, “The First Draft Report on the EDVAC”
(1947) Walter Pitts and Warren S. McCulloch, “How we know universals: the perception of auditory and visual forms,” Bulletin of Mathematical Biophysics 9:127-147.
(1958) F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychological Review, 65:386-408.