典型系列を使った強化学習の解析

岩田 一貴; 池田 和司; 酒井 英昭

講演名	2004/3/12 典型系列を使った強化学習の解析岩田一貴, 池田和司, 酒井英昭,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	強化学習における経験系列上では漸近等分割性と呼ばれる重要な性質が成り立つ.本論では,この性質を使って探査の明確な定義を与える.また,収益の最大化が確率的複雑さと環境に依存するある量によって特徴づけられることを示す.さらに,行動選択戦略のパラメータを調整するのに役立つ確率的複雑さの感度と,経験系列が最適な系列になる収束速度の限界を明らかにする.ここで,最適な系列とは最大収益をもたらす経験系列のことをいう.
抄録(英)	An important property called the asymptotic equipartition property holds on empirical sequences in reinforcement learning. Using this property we elucidate the explicit performance of exploration, and the fact that the return maximization is characterized by two factors, the stochastic complexity and a quantity depending on the parameters of environment. We also examine the sensitivity of stochastic complexity, which is useful in appropriately tuning the parameters of the action selection strategy, and show the bound of the convergence speed of the divergence between the empirical sequence and the best empirical sequence which produces a maximal return.
キーワード(和)	強化学習 / マルコフ決定過程 / 典型系列 / 漸近等分割性 / 確率的複雑さ
キーワード(英)	reinforcement learning / Markov decision process / typical sequence / asymptotic equipartition property / stochastic complexity
資料番号	NC2003-202
発行日

研究会情報
研究会	NC
開催期間	2004/3/12(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Neurocomputing (NC)
本文の言語	ENG
タイトル（和）	典型系列を使った強化学習の解析
サブタイトル（和）
タイトル（英）	An Analysis of Reinforcement Learning Using Typical Sequences
サブタイトル（和）
キーワード(1)（和/英）	強化学習 / reinforcement learning
キーワード(2)（和/英）	マルコフ決定過程 / Markov decision process
キーワード(3)（和/英）	典型系列 / typical sequence
キーワード(4)（和/英）	漸近等分割性 / asymptotic equipartition property
キーワード(5)（和/英）	確率的複雑さ / stochastic complexity
第 1 著者氏名（和/英）	岩田一貴 / Kazunori IWATA
第 1 著者所属（和/英）	京都大学大学院情報学研究科システム科学専攻 Department of Systems Science, Graduate School of Informatics, Kyoto University
第 2 著者氏名（和/英）	池田和司 / Kazushi IKEDA
第 2 著者所属（和/英）	京都大学大学院情報学研究科システム科学専攻 Department of Systems Science, Graduate School of Informatics, Kyoto University
第 3 著者氏名（和/英）	酒井英昭 / Hideaki SAKAI
第 3 著者所属（和/英）	京都大学大学院情報学研究科システム科学専攻 Department of Systems Science, Graduate School of Informatics, Kyoto University
発表年月日	2004/3/12
資料番号	NC2003-202
巻番号（vol）	vol.103
号番号（no）	734
ページ範囲	pp.-
ページ数	6
発行日