変分ベイズ法による自然方策勾配の推定法(一般, 脳・ヒューマンモデリング, 一般)

松原 崇充; 森本 淳; 中西 淳; 佐藤 雅昭; 銅谷 賢治

Presentation	2005/10/11 Variational Bayesian method for estimating natural policy gradient Takamitsu MATSUBARA, Jun MORIMOTO, Jun NAKANISHI, Masaaki SATO, Kenji DOYA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Recently, natural policy gradient methods have been drawing much attention as a method for improving a policy in reinforcement learning tasks. Learning with natural policy gradient methods can be much more efficient compared to ordinary policy gradient methods because natural gradient represents the steepest gradient direction in the policy parameter space with any underlying structure. However, when the data set sampled from the current policy is insufficient, a least squares based method proposed in a previous study cannot obtain a unique solution. In this paper, we propose an algorithm to estimate natural policy gradient based on a variational Bayesian method to avoid such an ill-posed problem. In the proposed algorithm. we introduce sparse prior distributions as priors for natural policy gradient and the weights of the function approximator for the value function, and also estimate the variance parameters of these sparse prior distributions from the sampled data. Thus, we can estimate the best possible natural policy gradient and value function even from the limited data set because in the proposed method. the basis functions which do not effectively explain the data will be automatically identified by the estimated variance parameters. We demonstrate that the proposed method achieves better performance in a reinforcement learning task of stabilizing an inverted pendulum, as an example. in comparison to the previous least squares based method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Reinforcement learning / natural policy gradient method / variational Bayesian method
Paper #	NC2005-52
Date of Issue

Conference Information
Committee	NC
Conference Date	2005/10/11(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Variational Bayesian method for estimating natural policy gradient
Sub Title (in English)
Keyword(1)	Reinforcement learning
Keyword(2)	natural policy gradient method
Keyword(3)	variational Bayesian method
1st Author's Name	Takamitsu MATSUBARA
1st Author's Affiliation	Nara Institute of Science and Technology:ATR, CNS()
2nd Author's Name	Jun MORIMOTO
2nd Author's Affiliation	ATR, CNS:ICORP, JST
3rd Author's Name	Jun NAKANISHI
3rd Author's Affiliation	ATR, CNS:ICORP, JST
4th Author's Name	Masaaki SATO
4th Author's Affiliation	ATR, CNS
5th Author's Name	Kenji DOYA
5th Author's Affiliation	Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology:ATR, CNS:Nara Institute of Science and Technology
Date	2005/10/11
Paper #	NC2005-52
Volume (vol)	vol.105
Number (no)	342
Page	pp.pp.-
#Pages	6
Date of Issue