二足歩行運動に対する方策勾配法に基づいた強化学習法

森 健; 中村 泰; 石井 信

講演名	2004/3/12 二足歩行運動に対する方策勾配法に基づいた強化学習法森健, 中村泰, 石井信,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	近年,方策パラメータによって決まる基底関数の空間に射影された低次元の価値関数を用いて学習を行う「方策勾配法に基づくactor-critic法」が開発された.この手法では,状態行動空間に比べて低次元の空間において価値関数の近似を行うため,価値関数の学習が比較的容易であり,ロボット制御などの大きな状態空間を持つ実問題に対して有用であると考えられる.生物を規範としたロボット制御機構であるCPGコントローラに対する強化学習モデルとして,CPG-actor-criticモデルがあり,この手法は状態行動空間での価値関数の学習を必要とするものである.本報告では,二足歩行ロボットシミュレータを用いた歩行運動獲得課題に対して,CPG-actor-criticモデルの学習法に「方策勾配法に基づくactor-critic法」を適用することにより,実問題における本手法の有効性を示す.
抄録(英)	Recently, an actor-critic method utilizing a lower dimensional projection of the value function based on a policy gradient method has been proposed. In this actor-critic method, the approximation of the value function is relatively easy, because the dimension of the projection space is lower than that of the state and action spaces. Then, its applications to real problems such as robot control can be easy. In our previous study, we presented a CPG-actor-critic model, which is a reinforcement learning model based on biological concepts, and applied it to an automatic control problem of a biped robot. In this report, we apply the actor-critic method based on the policy gradient method to the CPG-actor-critic model, and show that our method achieves robust control of the biped robot.
キーワード(和)	強化学習 / 方策勾配法 / Actor-critic法 / 二足歩行 / 中枢パターン生成器
キーワード(英)	Reinforcement learning / Policy gradient method / Actor-critic method / Biped locomotion / Central pattern generator
資料番号	NC2003-206
発行日

研究会情報
研究会	NC
開催期間	2004/3/12(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Neurocomputing (NC)
本文の言語	JPN
タイトル（和）	二足歩行運動に対する方策勾配法に基づいた強化学習法
サブタイトル（和）
タイトル（英）	Reinforcement learning based on a policy gradient method for biped locomotion
サブタイトル（和）
キーワード(1)（和/英）	強化学習 / Reinforcement learning
キーワード(2)（和/英）	方策勾配法 / Policy gradient method
キーワード(3)（和/英）	Actor-critic法 / Actor-critic method
キーワード(4)（和/英）	二足歩行 / Biped locomotion
キーワード(5)（和/英）	中枢パターン生成器 / Central pattern generator
第 1 著者氏名（和/英）	森健 / Takeshi MORI
第 1 著者所属（和/英）	奈良先端科学技術大学院大学 Graduate School of Information Science, Nara Institute of Science and Technology
第 2 著者氏名（和/英）	中村泰 / Yutaka NAKAMURA
第 2 著者所属（和/英）	奈良先端科学技術大学院大学 Graduate School of Information Science, Nara Institute of Science and Technology
第 3 著者氏名（和/英）	石井信 / Shin ISHII
第 3 著者所属（和/英）	奈良先端科学技術大学院大学 Graduate School of Information Science, Nara Institute of Science and Technology
発表年月日	2004/3/12
資料番号	NC2003-206
巻番号（vol）	vol.103
号番号（no）	734
ページ範囲	pp.-
ページ数	6
発行日