Presentation 2007-12-22
The Policy Gradient On The Average Reward Manifold
Takamitsu MATSUBARA, Jun MORIMOTO,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In this paper we propose a novel policy gradient type reinforcement learning method on the average reward manifold, in which a metric to measure the effect of change in policy parameters on the average reward is introduced. In our method, the derivative of the average reward with respect to the policy improvement can be fixed as a constant. Moreover, around a (sub-) optimal policy, the policy gradient method is equivalent to the Newton method. Simple simulation results with comparison to previously proposed natural policy gradient methods demonstrate the effectiveness of our policy gradient method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Reinforcement Learning / Policy Gradient / Natural Policy Gradient
Paper # NC2007-85
Date of Issue

Conference Information
Committee NC
Conference Date 2007/12/15(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) The Policy Gradient On The Average Reward Manifold
Sub Title (in English)
Keyword(1) Reinforcement Learning
Keyword(2) Policy Gradient
Keyword(3) Natural Policy Gradient
1st Author's Name Takamitsu MATSUBARA
1st Author's Affiliation Nara Institute of Science and Technology:ATR, CNS()
2nd Author's Name Jun MORIMOTO
2nd Author's Affiliation ATR, CNS:JST, ICORP, Computational Brain Project
Date 2007-12-22
Paper # NC2007-85
Volume (vol) vol.107
Number (no) 410
Page pp.pp.-
#Pages 6
Date of Issue