強化メタ学習則による遅延報酬問題の解決(学習,生体モデル,神経ダイナミクス,一般)

荒木 尚二郎; 酒井 裕

Presentation	2009-01-19 Reinforcement meta-learning rule solves the distal reward problem Shojiro ARAKI, Yutaka SAKAI,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	It is known that spike-timing-dependent synaptic plasticity (STDP) depends on the initial strength of the synapse, and that the dependence is asymmetric for potentiation and depression. It is pointed out that this fact implies a problem: the destination of a synapse should be restricted in a small region detemined by the initial-strength dependence, and little reflect the input-output statistics. If it holds true, then the learning paradigm drawn by Hebb would be broken. In order to solve the problem, we proposed a meta-learning rule depending on reinforcement signals. We applied the meta-learning for STDP learning rule that possesses asymmetric initial-strength dependence, and demonstrated that a single model neuron can learn the selectivity reflecting input statistics. We assume that the reinforcement signals reflect rewards given to the animal, and spread over the whole brain. Here we demonstrate that a single model neuron can learn the selectivity reflecting inputs correlated with rewards given a few seconds after the inputs. The proposed reinforcement meta-learning can solve the distal reward problem as well as the problem in the initial-strength dependence.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	STDP / dopamine / meta-learning rule / distal reward
Paper #	NC2008-95
Date of Issue

Conference Information
Committee	NC
Conference Date	2009/1/12(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Reinforcement meta-learning rule solves the distal reward problem
Sub Title (in English)
Keyword(1)	STDP
Keyword(2)	dopamine
Keyword(3)	meta-learning rule
Keyword(4)	distal reward
1st Author's Name	Shojiro ARAKI
1st Author's Affiliation	Graduate school of engineering, Tamagawa University()
2nd Author's Name	Yutaka SAKAI
2nd Author's Affiliation	Tamagawa University Brain Science Institute
Date	2009-01-19
Paper #	NC2008-95
Volume (vol)	vol.108
Number (no)	383
Page	pp.pp.-
#Pages	5
Date of Issue