二足歩行運動に対する方策勾配法に基づいた強化学習法

Presentation	2004/3/12 Reinforcement learning based on a policy gradient method for biped locomotion Takeshi MORI, Yutaka NAKAMURA, Shin ISHII,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Recently, an actor-critic method utilizing a lower dimensional projection of the value function based on a policy gradient method has been proposed. In this actor-critic method, the approximation of the value function is relatively easy, because the dimension of the projection space is lower than that of the state and action spaces. Then, its applications to real problems such as robot control can be easy. In our previous study, we presented a CPG-actor-critic model, which is a reinforcement learning model based on biological concepts, and applied it to an automatic control problem of a biped robot. In this report, we apply the actor-critic method based on the policy gradient method to the CPG-actor-critic model, and show that our method achieves robust control of the biped robot.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Reinforcement learning / Policy gradient method / Actor-critic method / Biped locomotion / Central pattern generator
Paper #	NC2003-206
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Reinforcement learning based on a policy gradient method for biped locomotion
Sub Title (in English)
Keyword(1)	Reinforcement learning
Keyword(2)	Policy gradient method
Keyword(3)	Actor-critic method
Keyword(4)	Biped locomotion
Keyword(5)	Central pattern generator
1st Author's Name	Takeshi MORI
1st Author's Affiliation	Graduate School of Information Science, Nara Institute of Science and Technology()
2nd Author's Name	Yutaka NAKAMURA
2nd Author's Affiliation	Graduate School of Information Science, Nara Institute of Science and Technology
3rd Author's Name	Shin ISHII
3rd Author's Affiliation	Graduate School of Information Science, Nara Institute of Science and Technology
Date	2004/3/12
Paper #	NC2003-206
Volume (vol)	vol.103
Number (no)	734
Page	pp.pp.-
#Pages	6
Date of Issue