NC2000-48 複数モデルベース強化学習におけるモジュール間の評価割り付け

Presentation	2000/7/11 Module-level Credit Assignment in Multiple Model-based Reinforcement Learning Kazuyuki Samejima, Kenji Doya, Mitsuo Kawato,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this report, we propose a new method for realizing globally optimal policies in the multiple model based reinforcement learning(NNRL)architecture[1]. MMRL decomposes a task in space and time by the"responsibility signal", which is a soft max function of the errors of local prediction models The proposed method is an extension to MMRL which used local prediction models for module selection by the responsibility signal that decompose environment softly in space and time.We introduce a module switching pseudo reward(MSPR)so that the weighted sum of modular value function are globally consistent. MSPR is given by the temporal difference of the responsibility signal and the difference in the value functions between switching modules. Thus MSPR enables the global value estimate to propagate between local modules. We test the performance of the proposed method in a non-linear control task of pendulum swing-up with limited torque. We show in simulation that the task is learned more quickly and robustly by the proposed method than by conventional MMRL.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	reinforcement learning / multiple prediction models / module switching pseudo reward
Paper #	NC2000-48
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Module-level Credit Assignment in Multiple Model-based Reinforcement Learning
Sub Title (in English)
Keyword(1)	reinforcement learning
Keyword(2)	multiple prediction models
Keyword(3)	module switching pseudo reward
1st Author's Name	Kazuyuki Samejima
1st Author's Affiliation	ERATO Kawato Dynamic Brain project, Japan Science and Technology Corporation()
2nd Author's Name	Kenji Doya
2nd Author's Affiliation	Information Sciences Devision, ATR International:CREST, Japan Science and Technology Corporation
3rd Author's Name	Mitsuo Kawato
3rd Author's Affiliation	ERATO Kawato Dynamic Brain project, Japan Science and Technology Corporation:ATR Human Information Processing Research Laboratories
Date	2000/7/11
Paper #	NC2000-48
Volume (vol)	vol.100
Number (no)	191
Page	pp.pp.-
#Pages	8
Date of Issue