平均報酬の多様体に基づく方策勾配法

Presentation	2007-12-22 The Policy Gradient On The Average Reward Manifold Takamitsu MATSUBARA, Jun MORIMOTO,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this paper we propose a novel policy gradient type reinforcement learning method on the average reward manifold, in which a metric to measure the effect of change in policy parameters on the average reward is introduced. In our method, the derivative of the average reward with respect to the policy improvement can be fixed as a constant. Moreover, around a (sub-) optimal policy, the policy gradient method is equivalent to the Newton method. Simple simulation results with comparison to previously proposed natural policy gradient methods demonstrate the effectiveness of our policy gradient method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Reinforcement Learning / Policy Gradient / Natural Policy Gradient
Paper #	NC2007-85
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	The Policy Gradient On The Average Reward Manifold
Sub Title (in English)
Keyword(1)	Reinforcement Learning
Keyword(2)	Policy Gradient
Keyword(3)	Natural Policy Gradient
1st Author's Name	Takamitsu MATSUBARA
1st Author's Affiliation	Nara Institute of Science and Technology:ATR, CNS()
2nd Author's Name	Jun MORIMOTO
2nd Author's Affiliation	ATR, CNS:JST, ICORP, Computational Brain Project
Date	2007-12-22
Paper #	NC2007-85
Volume (vol)	vol.107
Number (no)	410
Page	pp.pp.-
#Pages	6
Date of Issue