Presentation 2006-06-16
Reinforcement learning under constraints generated by multiple reward functions
Eiji UCHIBE, Kenji DOYA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) The objectives of the standard reinforcement learner are specified by the extrinsic reward function given by human designers. On the other hand, an intrinsically motivated reinforcement learner creates the reward function based on novelty, prediction error, and learning progress. This paper proposes a novel approach to deal with intrinsic and extrinsic rewards for reinforcement learning. The extrinsic rewards give constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning system. By integrating the policy gradient reinforcement learning algorithms and the techniques of nonlinear programming, our proposed method maximizes the average reward of the intrinsic reward under the inequality constraints induced by the extrinsic rewards. We apply the proposed method into a simple MDP and a control task of a robot arm. Experimental results show the validity of our method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) intrinsic and extrinsic rewards / nonlinear programming / policy gradient reinforcement learning
Paper # NC2006-22
Date of Issue

Conference Information
Committee NC
Conference Date 2006/6/9(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Reinforcement learning under constraints generated by multiple reward functions
Sub Title (in English)
Keyword(1) intrinsic and extrinsic rewards
Keyword(2) nonlinear programming
Keyword(3) policy gradient reinforcement learning
1st Author's Name Eiji UCHIBE
1st Author's Affiliation Okinawa Institute of Science and Technology Promotion Corporation()
2nd Author's Name Kenji DOYA
2nd Author's Affiliation Okinawa Institute of Science and Technology Promotion Corporation:ATR Computational Neuroscience Laboratories
Date 2006-06-16
Paper # NC2006-22
Volume (vol) vol.106
Number (no) 102
Page pp.pp.-
#Pages 6
Date of Issue