Monday, June 6, 2016

Some thoughts on reinforcement learning

What role does reinforcement learning play in natural learning of humans?

Much of learning is about gaining the ability to perform certain actions, certain skills, of various levels of complexity. Fundamentally what is learned is to perceive the environmental input, taking in the various stimuli around (visual, auditory, somatosensory) and to produce a certain motor output that would cause a change in the environment (the perceptual input), repeatedly and continuously. We learn to produce certain changes, certain outcomes, as a means of controlling the environment, or feeling competent, by having the ability to produce a specific outcome at will. However, it appears that most outcomes are not inherently "positive" or "negative", they just are (though some might be more "interesting" than others at any given point). Most outcomes in that context are continuous, if your actions are a little different, the outcome will be a little different (also is the environmental context is a little different and the actions are the same then the outcome may still be different). Developing higher levels of a skill involve constructing an understanding of those small differences and knowing how to utilize them flexibly, depending on the context and the task at hand (this also happens to be my best definition of intelligence).

The ultimate goal is of course to meet one's needs, so actions that produce desirable outcomes (food, social validation) would be more likely to be repeated, while those that produce undesirable outcomes (pain, social rejection) out be less likely to be repeated. This mechanism is just not the primary mechanism by which the production of complex actions is learned, however. The good/bad spectrum is not sufficient for flexible actions, since rewards for specific outcomes are by definition discrete rather than continuous.

If this analysis is correct, what would be the effect of trying to influence one's behavior and learning by offering rewards for performance? This would increase the motivation to perform, but if the action is sufficiently complicated, and sufficiently removed from the current level of the individual, the rewards are not likely to improve the ability to perform (which requires more exploration and "playing" with things). It may in fact have the opposite effect, by increasing frustration over the current inability to perform, which would inhibit the ability to tap into the proper learning mechanism. Additionally, the presence of rewards (or punishments) becomes a part of the context that the individual responds to in their actions, and therefore those actions may not transfer to situations where the same rewards (or punishments) are not present.

That is why the education paradigm as we know it is practically unable to produce true competence in any subject matter, other than perhaps in areas that the kids already have an interest in (and would have probably mastered on their own anyway). Since the entire approach revolves around externally-controlled behaviors ("now we're working on this specific thing, this is what's "right" and anything else is "wrong", practice until you are able to produce the correct response"), there is no room for exploratory behavior, and besides kids are less likely to be interested to explore the subjects that they did not select on their own. To tap into the proper learning mechanism, the internally-driven desire to develop a skill or understand a subject, one needs to have a purpose for it, there have to be no external controls, nobody suggesting what to work, no rush to "get it right" at any specific time point. It is an extremely egocentric endeavor, although it could also be used to master social skills and the ability to produce value to others - as long as the individual has that goal for themselves (most young kids do, as long as it is their choice). That's the way to develop true competence, with the confidence that comes with it (very different from the insecure competitiveness that results from externally-controlled performance).

Incidentally, this analysis also explains why reinforcement learning (and training on simple win/lose games) can not yield "real" intelligence for artificial intelligence researchers. It's funny how they hope to "solve intelligence" without having any real understanding of what intelligence is, with the hope that once they build it then they will be able to understand it... It just doesn't work this way.

No comments:

Post a Comment