Presentation | 2006-06-16 Reinforcement learning under constraints generated by multiple reward functions Eiji UCHIBE, Kenji DOYA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | The objectives of the standard reinforcement learner are specified by the extrinsic reward function given by human designers. On the other hand, an intrinsically motivated reinforcement learner creates the reward function based on novelty, prediction error, and learning progress. This paper proposes a novel approach to deal with intrinsic and extrinsic rewards for reinforcement learning. The extrinsic rewards give constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning system. By integrating the policy gradient reinforcement learning algorithms and the techniques of nonlinear programming, our proposed method maximizes the average reward of the intrinsic reward under the inequality constraints induced by the extrinsic rewards. We apply the proposed method into a simple MDP and a control task of a robot arm. Experimental results show the validity of our method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | intrinsic and extrinsic rewards / nonlinear programming / policy gradient reinforcement learning |
Paper # | NC2006-22 |
Date of Issue |
Conference Information | |
Committee | NC |
---|---|
Conference Date | 2006/6/9(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Neurocomputing (NC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Reinforcement learning under constraints generated by multiple reward functions |
Sub Title (in English) | |
Keyword(1) | intrinsic and extrinsic rewards |
Keyword(2) | nonlinear programming |
Keyword(3) | policy gradient reinforcement learning |
1st Author's Name | Eiji UCHIBE |
1st Author's Affiliation | Okinawa Institute of Science and Technology Promotion Corporation() |
2nd Author's Name | Kenji DOYA |
2nd Author's Affiliation | Okinawa Institute of Science and Technology Promotion Corporation:ATR Computational Neuroscience Laboratories |
Date | 2006-06-16 |
Paper # | NC2006-22 |
Volume (vol) | vol.106 |
Number (no) | 102 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |