時間整合的マルコフ決定過程のロバスト性(第15回情報論的学習理論ワークショップ)

恐神 貴行

講演名	2012-11-07 時間整合的マルコフ決定過程のロバスト性(第15回情報論的学習理論ワークショップ) 恐神貴行,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	マルコフ決定過程(MDP)の目的関数が、単調性を持つ反復的リスク指標である場合には、そのMDPの最適施策が動的計画法によって求められることを示す。単調性を持つ反復的リスク指標が更に並進不変性を持つ場合には、MDPの最適施策がより効率的に求められることを示す。期待効用では表現できないが理にかなっていると思われるリスク選好が、反復的リスク指標で表現できることを示す。更に、ある反復的リスク指標の最小化を目的とするMDPは、ロバストMDPとして解釈できることを示す。ロバストMDPは、MDPのパラメータ値が不確実であることを前提とし、最悪の場合に対して、累積期待コストなどを最小化することを目的とする。具体的には、期待指数効用の最小化を目的とするMDPは、期待値からパラメータの基準値からの乖離度のカルバック・ライブラー距離を減じた値を、最悪の場合において最小化するロバストMDPと等価であることを示す。また、コヒーレントなリスク指標からなる反復的リスク指標の値を最小化することを目的とするMDPは、ある凹関数によって不確実性が特徴付けられるロバストMDPと等価であることを示す。
抄録(英)	We show that an optimal policy for a Markov decision process (MDP) can be found with dynamic programming, when the objective is to minimize an iterated risk measure (IRM) that has a property we call monotonicity. When the monotonic IRM has the additional property of translation-invariance, we show that the optimal policy can be found more efficiently. We then demonstrate that expected utility is inflexible in representing a significant set of risk-preferences that appear to be reasonable, but that such risk-preferences can be represented by IRMs. Furthermore, we show that the MDP with the objective of minimizing an IRM can be interpreted as a robust MDP, whose objective is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. Specifically, we show that an MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that an MDP of minimizing an IRM that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function.
キーワード(和)	マルコフ決定過程 / 時間整合性 / 反復的リスク指標 / 期待効用 / 動的計画法 / リスク / ロバスト
キーワード(英)	Markov decision process / Time-consistency / Iterated risk measure / Expected utility / Dynamic programming / Risk / Robust
資料番号	IBISML2012-40
発行日

研究会情報
研究会	IBISML
開催期間	2012/10/31(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Information-Based Induction Sciences and Machine Learning (IBISML)
本文の言語	ENG
タイトル（和）	時間整合的マルコフ決定過程のロバスト性(第15回情報論的学習理論ワークショップ)
サブタイトル（和）
タイトル（英）	Robustness of time-consistent Markov decision processes
サブタイトル（和）
キーワード(1)（和/英）	マルコフ決定過程 / Markov decision process
キーワード(2)（和/英）	時間整合性 / Time-consistency
キーワード(3)（和/英）	反復的リスク指標 / Iterated risk measure
キーワード(4)（和/英）	期待効用 / Expected utility
キーワード(5)（和/英）	動的計画法 / Dynamic programming
キーワード(6)（和/英）	リスク / Risk
キーワード(7)（和/英）	ロバスト / Robust
第 1 著者氏名（和/英）	恐神貴行 / Takayuki OSOGAMI
第 1 著者所属（和/英）	日本アイ・ビー・エム(株)東京基礎研究所 IBM Research-Tokyo
発表年月日	2012-11-07
資料番号	IBISML2012-40
巻番号（vol）	vol.112
号番号（no）	279
ページ範囲	pp.-
ページ数	8
発行日