回避行動の再利用メカニズムを備えた強化学習のための関数近似器修正手法と多関節ロボットへの応用

山口 明彦; 杉本 徳和; 川人 光男

講演名	2007-12-22 回避行動の再利用メカニズムを備えた強化学習のための関数近似器修正手法と多関節ロボットへの応用山口明彦, 杉本徳和, 川人光男,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	強化学習などの学習手法をロボットの運動学習に適用する際に問題となる学習コスト(転倒によるダメージなど)を軽減する一手法として,我々はあるタスクの学習中に回避行動を分離して学習しこれをほかのタスクの学習で再利用する手法を強化学習の枠組で提案,4リンク程度の土台非固定型ロボットへの応用を行ってきた[1].本稿では分離学習における分離性能を向上させることを目的として基底関数を修正する手法を提案し,運動学習における有効性を示す.さらに回避行動を再利用することによって運動学習における転倒ダメージが軽減するかを検討する.
抄録(英)	Applying a learning method, such as reinforcement learning, to learning motions of multi-link robots requires large cost, such as damage from falling down. To overcome this problem, we proposed a reusing mechanism for reinforcement learning where the avoidance actions, such as not to fall down, are learned separately from primary actions, then they are reused in learning new tasks [1]. A method to apply it to learning whole-body motions of 4-link robot whose base is not fixed to a ground was also developed. In this paper, we propose a new method to modify basis functions of a function approximator of an action value function to improve the separative performance, and demonstrate the method works effectively in learning whole-body motions of a multi-link robot. Furthermore, we investigate a learning cost of damage from falling down in learning whole-body motions is reduced by reusing avoidance actions.
キーワード(和)	運動学習 / 強化学習 / 再利用 / 回避行動 / 跳躍 / サーブ
キーワード(英)	motion learning / reinforcement learning / reusing / avoidance actions / jumpping / serve
資料番号	NC2007-86
発行日

研究会情報
研究会	NC
開催期間	2007/12/15(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Neurocomputing (NC)
本文の言語	JPN
タイトル（和）	回避行動の再利用メカニズムを備えた強化学習のための関数近似器修正手法と多関節ロボットへの応用
サブタイトル（和）
タイトル（英）	A Modification Algorithm of Function Approximator for the Reinforcement Learning with Reusing Mechanism of Avoidance Actions : Proposal and its Application to Motion Learning of Multi-Link Robot
サブタイトル（和）
キーワード(1)（和/英）	運動学習 / motion learning
キーワード(2)（和/英）	強化学習 / reinforcement learning
キーワード(3)（和/英）	再利用 / reusing
キーワード(4)（和/英）	回避行動 / avoidance actions
キーワード(5)（和/英）	跳躍 / jumpping
キーワード(6)（和/英）	サーブ / serve
第 1 著者氏名（和/英）	山口明彦 / Akihiko YAMAGUCHI
第 1 著者所属（和/英）	奈良先端科学技術大学院大学:ATR脳情報研究所 Nara Institute of Science and Technology:ATR Computational Neuroscience Laboratories
第 2 著者氏名（和/英）	杉本徳和 / Norikazu SUGIMOTO
第 2 著者所属（和/英）	ATR脳情報研究所 ATR Computational Neuroscience Laboratories
第 3 著者氏名（和/英）	川人光男 / Mitsuo KAWATO
第 3 著者所属（和/英）	ATR脳情報研究所:奈良先端科学技術大学院大学 ATR Computational Neuroscience Laboratories:Nara Institute of Science and Technology
発表年月日	2007-12-22
資料番号	NC2007-86
巻番号（vol）	vol.107
号番号（no）	410
ページ範囲	pp.-
ページ数	6
発行日