Today I came across a fascinating article in Seed Magazine on dopamine, artificial intelligence and social learning. There’s so much good stuff in there that it may inspire a few posts. Here’s the first one.
Dopamine is great stuff. It doesn’t just let us take pleasure in our world — it’s also key in helping us understand it. Cambridge University’s Wolfram Schultz studied dopamine’s role in triggering Parkinson’s disease. Schultz recorded dopamine levels in monkey’s brains to study how dopamine neurons died in the part of the brain that controls movement. As he collected data for the study he noticed that dopamine neurons fired just before the monkeys were rewarded for moving.
Stunned, Schultz realized that he had just discovered the brain’s reward mechanism — and he’d done it by accident. According to Seed:
His experiments observed a simple protocol: He played a loud tone, waited for a few seconds, and then squirted a few drops of apple juice into the mouth of a monkey. While the experiment was unfolding, Schultz was probing the dopamine-rich areas of the monkey brain with a needle that monitored the electrical activity inside individual cells. At first the dopamine neurons didn’t fire until the juice was delivered; they were responding to the actual reward. However, once the animal learned that the tone preceded the arrival of juice — this requires only a few trials — the same neurons began firing at the sound of the tone instead of the sweet reward. And then eventually, if the tone kept on predicting the juice, the cells went silent. They stopped firing altogether.
The data were fascinating and utterly perplexing. Schultz knew that the dopamine neurons were a key part of the learning process, he just didn’t know how they were doing it.
Meanwhile, over at the Salk Insitute, computer scientists Read Montague and Peter Dayan were working on the temporal difference reinforcement learning (TDRL) model of artificial intelligence. Their goal was to create a “neuronlike” program that could learn simple rules to perform goal-oriented behaviors. The Seed article notes:
The basic premise is straightforward: The software makes predictions about what will happen — about how a checkers game will unfold for example — and then compares these predictions with what actually happens. If the prediction is right, that series of predictions gets reinforced. However, if the prediction is wrong, the software reevaluates its representation of the game.
Montague and Dayan were holding the key to Schultz’s dopamine release as learning mechanism conundrum. And Schultz had collected neurological data that supported Sutton and Dayan’s TRDL model. Dayan discovered the link in 1991. Seed quotes him as saying:
“The only reason we could see it so clearly,” Montague says, “is because we came at it from this theoretical angle. If you were an experimentalist seeing this data, it would have been extremely confusing. What the hell are these cells doing? Why aren’t they just responding to the juice?”
The truly fascinating thing about the model is that it’s based on expectations. Predictability, correlation and contrast are the basis of learning — not reward and punishment. I may just be a dumb dog trainer, but this isn’t news to me.
A predictable world makes sense. Predictability builds confidence. When you work with a shy or insecure animal, maintaining a high degree of predictability can build the confidence. Correlation is the foundation of predictability. Contrast tells us what things don’t correlate, it teaches us what things don’t work (and what works better than we expected). It also works neatly in conjunction with correlation to teach animals how to generalize. Add a little contrast to a lot of correlation and you’ve created the perfect proofing exercise.
These three processes work together to help an animal make sense out of the geopbytesof information that inundate it in day to day life. And much of the beauty of the system lies in the fact that it’s a dynamic one. The first time that we notice that A correlates to B we are surprised. The neurochemistry of surprise makes novel events memorable because bursts of dopamine are emitted in the wake of unexpected rewards. After a few repetitions of the pattern, we store that bit (or byte, as the case may be) of correlative information away. If, at a later time, the correlation fails, we’re surprised by the failure in our expectations and our neurons readjust to the new situation as dopamine production declines when we expect a reward that we don’t receive. Sometimes those rewards are explicit ones like food treats or payroll bonuses — but they can also be implicit rewards like the excitement you feel when your team wins the big game or the thrill your dog feels when he finds his bumper in that tuft of grass where he expected it to be.
But rewards alone won’t create learning. We need the contrast that comes from errors too. Again from Seed (bold mine):
“The accuracy comes from the mismatch,” Montague says. “You learn how the world works by focusing on the prediction errors, on the events that you didn’t expect.” Our knowledge, in other words, emerges from our cellular mistakes. The brain learns how to be right by focusing on what it got wrong.
And there you have it. It’s impossible to learn unless you make mistakes. The freedom to make mistakes is vital to intellectual growth. This is why raising a dog in an over-managed way does irreparable harm to his intellect. A dog can only learn to think for himself if he’s given the opportunity to do so. Making mistakes and coping with their consequences is a key part of learning. It helps the dog evaluate the contrast he needs to generalize his learning to new situations — and to reject previous learning when it doesn’t fit a novel situation.
I think that it’s also interesting to note that the model appears to imply that the implicit dopamine rewards of making correct, correlative predictions are stronger and more long-lasting than explicit rewards like food. This may help explain why the associations created by negative reinforcment (a concept most trainers don’t really understand) are very strong and long-lasting.