We at Turintech work on researching and identifying optimisation techniques and machine learning techniques to help customers solve their problems. AI, or Artificial Intelligence, has become a very commonly applied term today, and it is often used ambiguously, or even incorrectly to describe deep learning and machine learning. In this article, we will write down some simplistic definitions of what we do have today, and then go on to explain what they are in more detail, where they fall short, and some steps towards creating more fully capable ‘AI’ with new architectures.

Machine Learning – Fitting functions to data, and using the functions to group it or predict things about future data. (greatly oversimplified)

Deep Learning – Fitting functions to data as above, where those functions are layers of nodes that are connected (densely or otherwise) to the nodes before and after them, and the parameters being fitted are the weights of those connections.

Deep Learning is what what usually gets called AI today, but is really just very elaborate pattern recognition and statistical modelling. The most common techniques / algorithms are Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Reinforcement Learning (RL).

Convolutional Neural Networks (CNNs) have a hierarchical structure (which is usually 2D for images), where an image is sampled by (trained) convolution filters into a lower resolution map that represents the value of the convolution operation at each point. In images it goes from high-res pixels, to fine features (edges, circles,….) to coarse features (noses, eyes, lips, … on faces), then to the fully connected layers that can identify what is in the image. The cool part of CNNs is that the convolutional filters are randomly initialized, then when you train the network, you are actually training the convolution filters. For decades, computer vision researchers had hand-crafted filters like this, but could never get results as accurate as CNNs can get. Additionally, the output of a CNN can be an 2D map instead of a single value, giving us a image segmentation. CNNs can also be used on many other types of 1D, 2D and even 3D data.

Recurrent Neural Networks (RNNs) work well for sequential or time series data. Basically each ‘neural’ node in an RNN is kind of a memory gate, often an LSTM or Long Short Term Memory cell. When these are linked up in layers of a neural net, these cells/nodes also have recurrent connections looping back into themselves and so tend to hold onto information that passes through them, retaining a memory and allowing processing not only of current information, but past information in the network as well. As such, RNNs are good for time sequential operations like language processing or translation, as well as signal processing, Text To Speech, Speech To Text,…and so on

Reinforcement Learning is a third main DL method, where you train a learning agent to solve a complex problem by simply taking the best actions given a state, with the probability of taking each action at each state defined by a policy. An example is running a maze, where the position of each cell is the ‘state’, the 4 possible directions to move are the actions, and the probability of moving each direction, at each cell (state) forms the policy.

By repeatedly running through the states and possible actions and rewarding the sequence of actions that gave a good result (by increasing the probabilities of those actions in the policy), and penalizing the actions that gave a negative result (by decreasing the probabilities of those actions in the policy). In time you arrive at an optimal policy, which has the highest probability of a successful outcome. Usually while training, you discount the penalties/rewards for actions further back in time.

In our maze example, this means allowing an agent to go through the maze, choosing a direction to move from each cell by using the probabilities in the policy, and when it reaches a dead-end, penalizing the series of choices that got it there by reducing the probability of moving that direction from each cell again. If the agent finds the exit, we go back and reward the choices that got it there by increasing probabilities of moving that direction from each cell. In time the agent learns the fastest way through the maze to the exit, or the optimal policy. Variations of Reinforcement learning are at the core of the AlphaGo AI and the Atari Video Game playing AI.

One last note goes to GANs, or Generative Adversarial Networks, which is more of a technique than an architecture. Currently it is used with CNNs to make image discriminators and generators. The discriminator is a CNN that is trained to recognize images. The generator is an inverse network that take a random seed and uses it to generate images. The discriminator evaluates the output of the generator and sends signals to the generator on how to improve, and the generator in turn sends signals to the discriminator to improve its accuracy as well, going back and forth in a zero-sum game till they both converge to best quality. This is a way of providing self-reinforcing feedback to a neural system, which we will revisit later.

Of course, there are rich variations and combination of all these methods, as well as many others, and combined, these are the bread and butter of Deep Learning, which is what we call AI today, but perhaps prematurely, as these methods do not have any cognition, intelligence, or intuition, and are more brute-force statistical analysis / pattern recognition and often require large amounts of (labelled) data to train to a given standard.

Although they can work very well for model problems and benchmarks, these techniques sometimes do not scale or work as well once you try to apply them outside those specific problems they were designed for. For real-world problems, they sometimes do not perform as well, even if you can scale-up and redesign the network topology and tune them. Sometimes we just do not have enough data to train them sufficiently to make them robust and accurate in deployment. Or perhaps the real-life problem just can’t be quantified well enough, for example the Imagenet image classification competition has 1000 object classifications, but in a real-life application, there are probably millions of object classifications and sub classifications. To get the DL systems to do new things, or recognized new data, they have to be re-trained constantly and re-deployed, and cannot learn in-situ, on the fly in the field.

As well, many applications require combining multiple DL techniques together and finding ways to fuse them. A simple example is video tagging – you pass the video frames through a CNN, and at the top have an RNN to capture the temporal behavior of the features in those videos with time. As an example, I helped a researcher/entrepreneur use this technique to recognize facial expressions of a quadriplegic to issue commands to their wheelchair and robotic prosthesis, with a different facial expression/gesture paired with each command. It will work, but as you scale it up, it may be time consuming and tricky to develop and train, because you now have to tune two different types of DL networks that are intertwined, and it is sometimes hard to know what effect these tweaks are having.

Now imagine you had multiple of these CNN/RNN networks feeding input, a deep reinforcement learning engine making decisions on this input state, then driving generative networks creating output. These are a lot of specific DL techniques hacked together to accomplish a set of tasks. Can you say debugging and tuning hell? Will it even work? I don’t know, but if it does it will cost a lot and take a long time to get working, and you’d have to be very creative to isolate and test the systems and progressively integrate them, and it is uncertain whether it could train well enough in combination to even perform in real-world conditions.

Our current DL techniques each represent a reduced sub-set of how a brain’s networks and our nervous system work, each like a different 2D projection or shadow, or a discretized subset of the real thing, with functionality that sorta, kinda works for certain cases, and we call it ‘neural’, but it really is not. They are each specialized to specific tasks.

In fact, what most people practicing DL or ‘AI’ today don’t realize is that today’s ‘neural networks’ and ‘neurons’ in deep learning are just the simplest subset of a much larger and richer family of synthetic neurons, neural networks and methods. Most of the layered neural networks and CNNs we use in DL today fall into a smaller family called Feed-Forward Neural Networks, which simply sum the weighted inputs at each node, apply a simple transfer function, and pass the result to the next layer. This is not an accurate model of how the brain works by any means, and even RNNs and reinforcement learning are not giving us true artificial intelligence, but just fitting the parameters of very large and complex functions to large amounts of data, and using statistics to find patterns and make decisions.

The methods top and left, especially the Spiking Neural Networks, give a more accurate model of how real neurons operate, with simple, compute-efficient models like ‘Integrate and Fire’, and ‘Izhikevich’ to more complex models like ‘Hodgkin-Huxley’ that come close to modelling a biological neuron’s behavior, and modelling how networks of them interact and work in the brain, opening up much richer neural computational models.

In real neurons, time-domain signal pulses travel along the dendrites, and then arrive at the neuronal body independently, and are integrated in time and space inside it (some excite, some inhibit). When the neuronal body is triggered, it produces a time-dependent set of pulses down its axon, that split up as it branches and take time to travel out to the synapses, which themselves exhibit a non-linear, delayed, time-dependent integration as the chemical neurotransmitter signal passes across the synapse to eventually trigger a signal in the post-synaptic dendrite. There is a strengthening of the synapse, or learning, in the process, if the neurons on both sides of it fire together within a certain interval, called Hebbian learning. We may never be able to completely replicate all the electrochemical processes of a real biological neuron in hardware or software, but we can search for models that are sophisticated enough to represent much of the useful behavior needed in our spiking artificial neural networks.

This will bring us closer to more human-like AI, as real brains get much of their computing, sensory processing and body-controlling capability from the fact that signals ‘travel’ through neurons, axons, synapses, and dendrites, and thus travel through the brain structures, in complex, time-dependent circuits that can even have feedback loops to make circuits like timers or oscillators, or neural circuits that activate in a repeatable, cascading pattern to send specific time-dependent patterns of controlling signals to groups of muscles/actuators. These networks also learn by directly strengthening connections between neurons that repeatedly fire, called Hebbian learning. For doing more complex AI and decision making, they are much more powerful than the CNNs, static RNN’s and even deep reinforcement learning that we use in our above examples.

But there is one huge drawback – there are no current methods for fitting these kinds of networks to data to ‘train’ them. There is no back-propagation, nor gradient descent operations that tune the synaptic weights between neurons. The synapses just strengthen or weaken, and so the spiking neural network learns as it goes about its business of operating, using Hebbian learning as it goes, which may or may not work in practice to train our synthetic networks, as they have to be structured correctly in the first place for this to converge to a useful solution. This is an area of ongoing research, and a breakthrough in this area could be very significant. Below are my ideas, from the ORBAI provisional patent(US 62687179, filed Jun 19, 2018):

Next we will describe an approximation to how we best understand that the visual cortex works, with not only images from our retinas being processed into more high-level and abstract patterns and eventually ‘thoughts’ as they move deeper through the different, higher levels of the visuals cortex in the brain (similar to the classic CNN model), but also with thoughts cascading the other direction through the visual cortex, becoming features, and eventually images on the lowest levels of the cortex, where they resemble the images on our retinas. Just pause for a minute, close your eyes, and picture a ‘fire truck’… see it works, you can visualize, and perhaps even draw a fire truck, and in doing so, you just used your visual cortex in reverse, and CNNs cannot do that. But because our visual cortex works like this, we are always visualizing what we expect to see, and constantly comparing that with what we are actually seeing, at all levels of the visual cortex. Sensing is a dynamic, interactive process, not a static feed-forward one.

This describes a method for training a artificial neural (either spiking, or feed-forward) network where there are two networks that are intertwined and complementary to one another, with one transmitting signals in one direction, say from the sensory input, up through a hierarchical neural structure to more abstract levels to eventually classify the signals. There is also a complementary network interleaved with it that has signals that flow in the opposite direction, say from abstract to concrete, and from classification to sensory stimulus. The signals or connection strength in these two networks can be compared at the different levels of the network and the differences used as a ‘training’ signal to strengthen network connections where the differences are smaller and correlation tighter, and to weaken network connections where the differences are larger and not as tightly correlated. The signals can be repeatedly bounced back and forth off the highest and lowest levels to set up a training loop, almost like dreaming?

If this works well for synthetic neural networks, the result could be profound, and we can now ‘train’ these neural networks, while they operate, in-situ, in real-time (Think ‘Chappie’). This is far more dynamic, useful, and powerful than back-propagation and gradient descent during dedicated training for CNNs, and this wraps the functionality of (self-training) CNNs, GANs, and even Autoencoders into a single, more elegant architecture (which is expected if we are moving from special purpose ‘hacks’ to a more functional, robust, and brain-like network and neurons). With a bit of ‘retrofitting’, perhaps this technique could be used on a standard feed-forward CNN with an inverse, feedback CNN interleaved to train it as well.

Going back to GANs, they are close cousins with these feedforward / feedback interleaved networks, because the fwd/back networks are inverses of each other, and each train one another. The difference is that GANs are loosely coupled with specific interface points, but the feedforward / feedback networks can be very densely connected and as tightly coupled as you wish. This is huge, because one of the largest difficulties with GANs is determining the feedback method and signals to send between the generator / discriminator, and for other than simple data types like images it can get complicated. The feedback / feedforward method can work for any arbitrary system – vision, hearing, speech, motor control,…. because the training and communication method is intrinsic to the system, adapted to the network architecture. Yes, just like human neural systems, these are multipurpose, elegant, and very powerful.

Another problem with spiking neural nets is how do you connect the neurons in the first place? Sure, we can train the networks and strengthen/weaken synapses once we get rolling, but how do you even figure out how to construct them in the first place. I’m not going to go into enormous detail, but start with small hand-wired networks, use genetic algorithms to explore the design space, training with the above technique towards simple performance metrics, then assign a gene to each subnet that works well and start replicating them into bigger networks, again use genetic algorithms to shuffle the genes (and the subnets), train against more complex performance metrics, and keep iterating, hierarchically building larger networks at each step and assigning ‘genes’ to each construct at each level. Since the entire human brain develops from the information encoded in 8000 genes to a 100 billion neuron, 100 trillion synapse structure, it seems that this hierarchical encoding would be the only way to do this in a natural or synthetic neural system and we just have to figure out the details. In the provisional patent mentioned above, I call this collection of methods (and others) NeuroCAD.

One other drawback to implementing and scaling spiking neural network is that while they are capable of sparse computation (neurons and synapses only compute when a signal passes through them) and could be very low-power, most hardware we can run them on today like CPUs and GPUs compute constantly, updating the model for every neuron and synapse every time-slice (there may be workarounds). Many (large and small) companies are working on neuromorphic computing hardware that more closely matches the behavior of spiking neurons and neural networks in function and is able to compute sparsely, but it is difficult to provide enough flexibility in the neural model and to interconnect, scale, and train these networks at present, especially at network sizes large enough and organized properly to do useful work. A human brain equivalent would require over 100 billion neurons and 100 trillion synapses, which is far beyond the density of any 2D chip fabrication technology we have today. We will need new fabrication technologies, and probably 3D lattices of synthetic neurons, axons, dendrites and synapses to get there. Ouch, who wants to build the first fab?

If we can start solving these issues, and move towards more functional neuromorphic architectures that more fully represent how the brain, nervous system, and real neurons work and learn, we can start to consolidate some of the one-of, specific Deep Learning methods used today into these more powerful and flexible architectures that handle multiple modes of functionality with more elegant designs. As well, with these models, we will open up novel forms of neural computation and we will be able to apply them to tasks like computer vision, robot motor control, hearing, speech, and even cognition that is much more human-brain-like.

But will more sophisticated neural networks like this actually work in the end? Go look in the mirror and wave at yourself – yes it CAN work, you are that proof. Can we replicate it in an artificial system as capable as you? Now there is the trillion dollar question, isn’t it?

This article was written by Brent Oster (www.orbai.ai)

Source: https://www.quora.com/Why-has-deep-learning-become-so-famous-In-my-experience-it-s-not-applicable-to-the-majority-of-real-world-problems/answer/Brent-Oster?ch=10&share=ac72f996&srid=T5II