リワードマシンを用いる強化学習手法の計算性能とタスク難易度の関係

渡邊 隆二; 田中 剛平

講演名	2022-03-29 リワードマシンを用いる強化学習手法の計算性能とタスク難易度の関係渡邊隆二(東大), 田中剛平(東大),
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	強化学習では報酬が即時的に決まらないタスクにおいては学習中に過去の状態遷移の履歴を考慮することが必要である。リワードマシンはタスクを分割しそれぞれの過程における報酬関数を学習する手法で、それを用いた強化学習手法はQ学習などの従来の手法を超える速い学習速度と最適解への収束の保証を与えることが示されている。本報告では、グリッド状の環境で報酬獲得までのシンボル数や報酬関数の構造、環境の設定が異なる複数のタスクでの数値実験を行い、エピソードごとの報酬獲得率の変化を評価する。また、実験結果をもとにタスク難易度が計算性能に与える影響を論じる。
抄録(英)	In reinforcement learning, it is necessary to take into account the history of past state transitions during learning for tasks where the reward is not immediately determined. Reward Machines are a method that divides a task into parts and learns the reward function for each part of the process. Reinforcement learning methods using the reward machine have been shown to provide faster learning speed than conventional methods such as Q-learning and guarantees convergence to the optimal solution. In this report, we conduct numerical experiments on several tasks in a grid-like environment with different number of symbols to acquire a reward, different structures of reward functions, and different settings of the environment, and evaluate the changes in the rate of reward acquisition for each episode. We also discuss the effect of task difficulty on computational performance based on the experimental results.
キーワード(和)	強化学習 / 非マルコフ決定過程 / リワードマシン
キーワード(英)	Reinforcement learning / Non-Markov decision process / Reward Machines
資料番号	MSS2021-70,NLP2021-141
発行日	2022-03-21 (MSS, NLP)

研究会情報
研究会	MSS / NLP
開催期間	2022/3/28(から2日開催)
開催地（和）	オンライン開催
開催地（英）	Online
テーマ（和）	MSS，NLP，一般，およびWIP（MSSのみ）
テーマ（英）	MSS, NLP, Work In Progress (MSS only), and etc.
委員長氏名（和）	尾崎敦夫(阪工大) / 高坂拓司(中京大)
委員長氏名（英）	Atsuo Ozaki(Osaka Inst. of Tech.) / Takuji Kosaka(Chukyo Univ.)
副委員長氏名（和）	山口真悟(山口大) / 常田明夫(熊本大)
副委員長氏名（英）	Shingo Yamaguchi(Yamaguchi Univ.) / Akio Tsuneda(Kumamoto Univ.)
幹事氏名（和）	小林孝一(北大) / 劉健全(NEC) / 松下春奈(香川大) / 吉岡大三郎(崇城大)
幹事氏名（英）	Koichi Kobayashi(Hokkaido Univ.) / Jianquan Liui(NEC) / Haruna Matsushita(Kagawa Univ.) / Daizaburo Yoshioka(Sojo Univ.)
幹事補佐氏名（和）	白井匡人(島根大) / 加藤秀行(大分大) / 横井裕一(長崎大)
幹事補佐氏名（英）	Masato Shirai(Shimane Univ.) / Hideyuki Kato(Oita Univ.) / Yuichi Yokoi(Nagasaki Univ.)

講演論文情報詳細
申込み研究会	Technical Committee on Mathematical Systems Science and its Applications / Technical Committee on Nonlinear Problems
本文の言語	JPN
タイトル（和）	リワードマシンを用いる強化学習手法の計算性能とタスク難易度の関係
サブタイトル（和）
タイトル（英）	Relationship between Computational Performance and Task Difficulty of Reinforcement Learning Methods Using Reward Machines
サブタイトル（和）	*
キーワード(1)（和/英）	強化学習 / Reinforcement learning
キーワード(2)（和/英）	非マルコフ決定過程 / Non-Markov decision process
キーワード(3)（和/英）	リワードマシン / Reward Machines
第 1 著者氏名（和/英）	渡邊隆二 / Ryuji Watanabe
第 1 著者所属（和/英）	東京大学(略称：東大) The University of Tokyo(略称：The Univ. of Tokyo)
第 2 著者氏名（和/英）	田中剛平 / Gouhei Tanaka
第 2 著者所属（和/英）	東京大学(略称：東大) The University of Tokyo(略称：The Univ. of Tokyo)
発表年月日	2022-03-29
資料番号	MSS2021-70,NLP2021-141
巻番号（vol）	vol.121
号番号（no）	MSS-443,NLP-444
ページ範囲	pp.77-82(MSS), pp.77-82(NLP),
ページ数	6
発行日	2022-03-21 (MSS, NLP)