Presentation 2009-01-19
Reinforcement meta-learning rule solves the distal reward problem
Shojiro ARAKI, Yutaka SAKAI,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) It is known that spike-timing-dependent synaptic plasticity (STDP) depends on the initial strength of the synapse, and that the dependence is asymmetric for potentiation and depression. It is pointed out that this fact implies a problem: the destination of a synapse should be restricted in a small region detemined by the initial-strength dependence, and little reflect the input-output statistics. If it holds true, then the learning paradigm drawn by Hebb would be broken. In order to solve the problem, we proposed a meta-learning rule depending on reinforcement signals. We applied the meta-learning for STDP learning rule that possesses asymmetric initial-strength dependence, and demonstrated that a single model neuron can learn the selectivity reflecting input statistics. We assume that the reinforcement signals reflect rewards given to the animal, and spread over the whole brain. Here we demonstrate that a single model neuron can learn the selectivity reflecting inputs correlated with rewards given a few seconds after the inputs. The proposed reinforcement meta-learning can solve the distal reward problem as well as the problem in the initial-strength dependence.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) STDP / dopamine / meta-learning rule / distal reward
Paper # NC2008-95
Date of Issue

Conference Information
Committee NC
Conference Date 2009/1/12(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Reinforcement meta-learning rule solves the distal reward problem
Sub Title (in English)
Keyword(1) STDP
Keyword(2) dopamine
Keyword(3) meta-learning rule
Keyword(4) distal reward
1st Author's Name Shojiro ARAKI
1st Author's Affiliation Graduate school of engineering, Tamagawa University()
2nd Author's Name Yutaka SAKAI
2nd Author's Affiliation Tamagawa University Brain Science Institute
Date 2009-01-19
Paper # NC2008-95
Volume (vol) vol.108
Number (no) 383
Page pp.pp.-
#Pages 5
Date of Issue