Presentation | 2000/7/11 Module-level Credit Assignment in Multiple Model-based Reinforcement Learning Kazuyuki Samejima, Kenji Doya, Mitsuo Kawato, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In this report, we propose a new method for realizing globally optimal policies in the multiple model based reinforcement learning(NNRL)architecture[1]. MMRL decomposes a task in space and time by the"responsibility signal", which is a soft max function of the errors of local prediction models The proposed method is an extension to MMRL which used local prediction models for module selection by the responsibility signal that decompose environment softly in space and time.We introduce a module switching pseudo reward(MSPR)so that the weighted sum of modular value function are globally consistent. MSPR is given by the temporal difference of the responsibility signal and the difference in the value functions between switching modules. Thus MSPR enables the global value estimate to propagate between local modules. We test the performance of the proposed method in a non-linear control task of pendulum swing-up with limited torque. We show in simulation that the task is learned more quickly and robustly by the proposed method than by conventional MMRL. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | reinforcement learning / multiple prediction models / module switching pseudo reward |
Paper # | NC2000-48 |
Date of Issue |
Conference Information | |
Committee | NC |
---|---|
Conference Date | 2000/7/11(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Neurocomputing (NC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Module-level Credit Assignment in Multiple Model-based Reinforcement Learning |
Sub Title (in English) | |
Keyword(1) | reinforcement learning |
Keyword(2) | multiple prediction models |
Keyword(3) | module switching pseudo reward |
1st Author's Name | Kazuyuki Samejima |
1st Author's Affiliation | ERATO Kawato Dynamic Brain project, Japan Science and Technology Corporation() |
2nd Author's Name | Kenji Doya |
2nd Author's Affiliation | Information Sciences Devision, ATR International:CREST, Japan Science and Technology Corporation |
3rd Author's Name | Mitsuo Kawato |
3rd Author's Affiliation | ERATO Kawato Dynamic Brain project, Japan Science and Technology Corporation:ATR Human Information Processing Research Laboratories |
Date | 2000/7/11 |
Paper # | NC2000-48 |
Volume (vol) | vol.100 |
Number (no) | 191 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |