Presentation | 2005/10/11 Variational Bayesian method for estimating natural policy gradient Takamitsu MATSUBARA, Jun MORIMOTO, Jun NAKANISHI, Masaaki SATO, Kenji DOYA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Recently, natural policy gradient methods have been drawing much attention as a method for improving a policy in reinforcement learning tasks. Learning with natural policy gradient methods can be much more efficient compared to ordinary policy gradient methods because natural gradient represents the steepest gradient direction in the policy parameter space with any underlying structure. However, when the data set sampled from the current policy is insufficient, a least squares based method proposed in a previous study cannot obtain a unique solution. In this paper, we propose an algorithm to estimate natural policy gradient based on a variational Bayesian method to avoid such an ill-posed problem. In the proposed algorithm. we introduce sparse prior distributions as priors for natural policy gradient and the weights of the function approximator for the value function, and also estimate the variance parameters of these sparse prior distributions from the sampled data. Thus, we can estimate the best possible natural policy gradient and value function even from the limited data set because in the proposed method. the basis functions which do not effectively explain the data will be automatically identified by the estimated variance parameters. We demonstrate that the proposed method achieves better performance in a reinforcement learning task of stabilizing an inverted pendulum, as an example. in comparison to the previous least squares based method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Reinforcement learning / natural policy gradient method / variational Bayesian method |
Paper # | NC2005-52 |
Date of Issue |
Conference Information | |
Committee | NC |
---|---|
Conference Date | 2005/10/11(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Neurocomputing (NC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Variational Bayesian method for estimating natural policy gradient |
Sub Title (in English) | |
Keyword(1) | Reinforcement learning |
Keyword(2) | natural policy gradient method |
Keyword(3) | variational Bayesian method |
1st Author's Name | Takamitsu MATSUBARA |
1st Author's Affiliation | Nara Institute of Science and Technology:ATR, CNS() |
2nd Author's Name | Jun MORIMOTO |
2nd Author's Affiliation | ATR, CNS:ICORP, JST |
3rd Author's Name | Jun NAKANISHI |
3rd Author's Affiliation | ATR, CNS:ICORP, JST |
4th Author's Name | Masaaki SATO |
4th Author's Affiliation | ATR, CNS |
5th Author's Name | Kenji DOYA |
5th Author's Affiliation | Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology:ATR, CNS:Nara Institute of Science and Technology |
Date | 2005/10/11 |
Paper # | NC2005-52 |
Volume (vol) | vol.105 |
Number (no) | 342 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |