ロバスト強化学習

森本 淳; 銅谷 賢治

講演名	2000/7/11 ロバスト強化学習森本淳, 銅谷賢治,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本論文では, 入力外乱やモデル誤差を考慮した強化学習法の提案を行う.強化学習では, シミュレーションによるオフライン学習や, 行動のオンラインプラニングなど, 環境や制御対象のダイナミクスモデルが重要な役割を果たす.しかし, 実際の環境とモデルとの間の誤差のために, 学習した制御器を実際の制御対象にそのまま利用すると, 望みの性能が得られない可能性がある.そこで, H無限大制御理論の考え方に基づき, 外乱生成器が最悪外乱を出力し, 行動生成器が最適制御を行う微分ゲームを考える.この問題は, 外乱による報酬の変化と, 外乱自体の大きさを考慮した評価関数のmin-max解を見つける問題として定式化できる.この知見を用いて, オンラインで評価関数の推定と最悪外乱, 最適制御の計算を行う手法を示す.提案する学習法を単振り子の振り上げ課題に適用し, 従来の強化学習では対応できないようなモデル誤差に対してロバストな制御ができることを示す.
抄録(英)	This paper proposes a new reinforcement learning(RL)paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning by simulations and for on-line action planning. However, the difference between the model and the real environment can lead to unpredictable, often unwanted results. Based on the theory of H^∞ control, we consider a differential game in which a'disturbing'agent tries to make the worst possible disturbance while a'control'agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the changes in the reward due to the disturbance and the amplitude of the disturbance. We derive on-line learning algorithms for estimating the value function and for calculating both the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call"Robust Reinforcement Learning(RRL), "in the task of inverted pendulum. The control by RRL achieved robust performance against two-fold changes in the pendulum length while a standard RL control could not deal with such environmental changes.
キーワード(和)	強化学習 / ロバスト制御 / H無限大
キーワード(英)	reinforcement learning / robust control / H-infinity
資料番号	NC2000-49
発行日

研究会情報
研究会	NC
開催期間	2000/7/11(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Neurocomputing (NC)
本文の言語	ENG
タイトル（和）	ロバスト強化学習
サブタイトル（和）
タイトル（英）	NC2000-49 Robust Reinforcement Learning
サブタイトル（和）
キーワード(1)（和/英）	強化学習 / reinforcement learning
キーワード(2)（和/英）	ロバスト制御 / robust control
キーワード(3)（和/英）	H無限大 / H-infinity
第 1 著者氏名（和/英）	森本淳 / Jun Morimoto
第 1 著者所属（和/英）	奈良先端科学技術大学院大学情報科学研究科:科学技術振興事業団ERATO川人学習動態脳プロジェクト Graduate School of Information Science, Nara Institute of Science and Technology:Kawato Dynamic Brain Project, ERATO JST
第 2 著者氏名（和/英）	銅谷賢治 / Kenji Doya
第 2 著者所属（和/英）	国際電気通信基礎技術研究所情報科学研究部:科学技術振興事業団ERATO学習動態脳プロジェクト:奈良先端科学技術大学院大学情報科学研究科 Information Sciences Devision, ATR International:CREST, JST:Graduate School of Information Science, Nara Institute of Science and Technology
発表年月日	2000/7/11
資料番号	NC2000-49
巻番号（vol）	vol.100
号番号（no）	191
ページ範囲	pp.-
ページ数	8
発行日