オンラインEMアルゴリズムを用いた強化学習法

石井 信

講演名	1999/2/5 オンラインEMアルゴリズムを用いた強化学習法石井信,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	Actor-Criticモデルに基づく新しい強化学習の手法を提案する。ActorとCriticはいずれも正規化ガウス関数ネットワークによって近似され、先に提案したオンラインEMアルゴリズムを用いて学習を行なう。新しい強化学習の手法を単振子の振り上げ・倒立のタスクと、二重振子を頂点付近で倒立させるタスクに応用した。結果として、本手法が連続な状態空間と連続な制御信号空間を持つような最適制御問題に応用できることが示された。
抄録(英)	In this research report, we propose a new reinforcement learning (RL) method based on an actor-critic architecture. The actor and the critic are approximated by normalized Gausssian networks, which are trained by the on-line EM algorithm proposed in our previous paper. We apply our RL method to the task of swing-up and stabilizing a single pendulum and the task of balacing a double pendulum near the upright position. The experimental results show that our RL method can be applied to optimal control problems having continuous state/action spaces.
キーワード(和)	EMアルゴリズム / オンライン学習 / 確率モデル / 強化学習 / Actor-Criticモデル / 二重振子
キーワード(英)	EM algorithm / Online learning / Stochastic model / Reinforcement learning / Actor-critic model / Double pendulum
資料番号	NC98-83
発行日

研究会情報
研究会	NC
開催期間	1999/2/5(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Neurocomputing (NC)
本文の言語	JPN
タイトル（和）	オンラインEMアルゴリズムを用いた強化学習法
サブタイトル（和）
タイトル（英）	Reinforcement learning using on-line EM algorithm
サブタイトル（和）
キーワード(1)（和/英）	EMアルゴリズム / EM algorithm
キーワード(2)（和/英）	オンライン学習 / Online learning
キーワード(3)（和/英）	確率モデル / Stochastic model
キーワード(4)（和/英）	強化学習 / Reinforcement learning
キーワード(5)（和/英）	Actor-Criticモデル / Actor-critic model
キーワード(6)（和/英）	二重振子 / Double pendulum
第 1 著者氏名（和/英）	石井信 / Shin Ishii
第 1 著者所属（和/英）	奈良先端科学技術大学院大学情報科学研究科 Nara Institute of Science and Technology
発表年月日	1999/2/5
資料番号	NC98-83
巻番号（vol）	vol.98
号番号（no）	577
ページ範囲	pp.-
ページ数	8
発行日