Presentation 2000/7/11
Module-level Credit Assignment in Multiple Model-based Reinforcement Learning
Kazuyuki Samejima, Kenji Doya, Mitsuo Kawato,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this report, we propose a new method for realizing globally optimal policies in the multiple model based reinforcement learning(NNRL)architecture[1]. MMRL decomposes a task in space and time by the"responsibility signal", which is a soft max function of the errors of local prediction models The proposed method is an extension to MMRL which used local prediction models for module selection by the responsibility signal that decompose environment softly in space and time.We introduce a module switching pseudo reward(MSPR)so that the weighted sum of modular value function are globally consistent. MSPR is given by the temporal difference of the responsibility signal and the difference in the value functions between switching modules. Thus MSPR enables the global value estimate to propagate between local modules. We test the performance of the proposed method in a non-linear control task of pendulum swing-up with limited torque. We show in simulation that the task is learned more quickly and robustly by the proposed method than by conventional MMRL.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) reinforcement learning / multiple prediction models / module switching pseudo reward
Paper # NC2000-48
Date of Issue

Conference Information
Committee NC
Conference Date 2000/7/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Module-level Credit Assignment in Multiple Model-based Reinforcement Learning
Sub Title (in English)
Keyword(1) reinforcement learning
Keyword(2) multiple prediction models
Keyword(3) module switching pseudo reward
1st Author's Name Kazuyuki Samejima
1st Author's Affiliation ERATO Kawato Dynamic Brain project, Japan Science and Technology Corporation()
2nd Author's Name Kenji Doya
2nd Author's Affiliation Information Sciences Devision, ATR International:CREST, Japan Science and Technology Corporation
3rd Author's Name Mitsuo Kawato
3rd Author's Affiliation ERATO Kawato Dynamic Brain project, Japan Science and Technology Corporation:ATR Human Information Processing Research Laboratories
Date 2000/7/11
Paper # NC2000-48
Volume (vol) vol.100
Number (no) 191
Page pp.pp.-
#Pages 8
Date of Issue