複数の報酬によって与えられる拘束のもとでの強化学習(「機械学習によるバイオデータマインニング」及び「一般」)

Presentation	2006-06-16 Reinforcement learning under constraints generated by multiple reward functions Eiji UCHIBE, Kenji DOYA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	The objectives of the standard reinforcement learner are specified by the extrinsic reward function given by human designers. On the other hand, an intrinsically motivated reinforcement learner creates the reward function based on novelty, prediction error, and learning progress. This paper proposes a novel approach to deal with intrinsic and extrinsic rewards for reinforcement learning. The extrinsic rewards give constraints to the stochastic policy while the intrinsic reward determines the current objective function for the learning system. By integrating the policy gradient reinforcement learning algorithms and the techniques of nonlinear programming, our proposed method maximizes the average reward of the intrinsic reward under the inequality constraints induced by the extrinsic rewards. We apply the proposed method into a simple MDP and a control task of a robot arm. Experimental results show the validity of our method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	intrinsic and extrinsic rewards / nonlinear programming / policy gradient reinforcement learning
Paper #	NC2006-22
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Reinforcement learning under constraints generated by multiple reward functions
Sub Title (in English)
Keyword(1)	intrinsic and extrinsic rewards
Keyword(2)	nonlinear programming
Keyword(3)	policy gradient reinforcement learning
1st Author's Name	Eiji UCHIBE
1st Author's Affiliation	Okinawa Institute of Science and Technology Promotion Corporation()
2nd Author's Name	Kenji DOYA
2nd Author's Affiliation	Okinawa Institute of Science and Technology Promotion Corporation:ATR Computational Neuroscience Laboratories
Date	2006-06-16
Paper #	NC2006-22
Volume (vol)	vol.106
Number (no)	102
Page	pp.pp.-
#Pages	6
Date of Issue