Presentation 2005/10/11
Variational Bayesian method for estimating natural policy gradient
Takamitsu MATSUBARA, Jun MORIMOTO, Jun NAKANISHI, Masaaki SATO, Kenji DOYA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Recently, natural policy gradient methods have been drawing much attention as a method for improving a policy in reinforcement learning tasks. Learning with natural policy gradient methods can be much more efficient compared to ordinary policy gradient methods because natural gradient represents the steepest gradient direction in the policy parameter space with any underlying structure. However, when the data set sampled from the current policy is insufficient, a least squares based method proposed in a previous study cannot obtain a unique solution. In this paper, we propose an algorithm to estimate natural policy gradient based on a variational Bayesian method to avoid such an ill-posed problem. In the proposed algorithm. we introduce sparse prior distributions as priors for natural policy gradient and the weights of the function approximator for the value function, and also estimate the variance parameters of these sparse prior distributions from the sampled data. Thus, we can estimate the best possible natural policy gradient and value function even from the limited data set because in the proposed method. the basis functions which do not effectively explain the data will be automatically identified by the estimated variance parameters. We demonstrate that the proposed method achieves better performance in a reinforcement learning task of stabilizing an inverted pendulum, as an example. in comparison to the previous least squares based method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Reinforcement learning / natural policy gradient method / variational Bayesian method
Paper # NC2005-52
Date of Issue

Conference Information
Committee NC
Conference Date 2005/10/11(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Variational Bayesian method for estimating natural policy gradient
Sub Title (in English)
Keyword(1) Reinforcement learning
Keyword(2) natural policy gradient method
Keyword(3) variational Bayesian method
1st Author's Name Takamitsu MATSUBARA
1st Author's Affiliation Nara Institute of Science and Technology:ATR, CNS()
2nd Author's Name Jun MORIMOTO
2nd Author's Affiliation ATR, CNS:ICORP, JST
3rd Author's Name Jun NAKANISHI
3rd Author's Affiliation ATR, CNS:ICORP, JST
4th Author's Name Masaaki SATO
4th Author's Affiliation ATR, CNS
5th Author's Name Kenji DOYA
5th Author's Affiliation Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology:ATR, CNS:Nara Institute of Science and Technology
Date 2005/10/11
Paper # NC2005-52
Volume (vol) vol.105
Number (no) 342
Page pp.pp.-
#Pages 6
Date of Issue