時間整合的マルコフ決定過程のロバスト性(第15回情報論的学習理論ワークショップ)

恐神 貴行

Presentation	2012-11-07 Robustness of time-consistent Markov decision processes Takayuki OSOGAMI,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	We show that an optimal policy for a Markov decision process (MDP) can be found with dynamic programming, when the objective is to minimize an iterated risk measure (IRM) that has a property we call monotonicity. When the monotonic IRM has the additional property of translation-invariance, we show that the optimal policy can be found more efficiently. We then demonstrate that expected utility is inflexible in representing a significant set of risk-preferences that appear to be reasonable, but that such risk-preferences can be represented by IRMs. Furthermore, we show that the MDP with the objective of minimizing an IRM can be interpreted as a robust MDP, whose objective is to minimize a function, such as the expectation of cumulative cost, for the worst case when the parameters have uncertainties. Specifically, we show that an MDP of minimizing the expected exponential utility is equivalent to a robust MDP of minimizing the worst-case expectation with a penalty for the deviation of the uncertain parameters from their nominal values, which is measured with the Kullback-Leibler divergence. We also show that an MDP of minimizing an IRM that is composed of certain coherent risk measures is equivalent to a robust MDP of minimizing the worst-case expectation when the possible deviations of uncertain parameters from their nominal values are characterized with a concave function.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Markov decision process / Time-consistency / Iterated risk measure / Expected utility / Dynamic programming / Risk / Robust
Paper #	IBISML2012-40
Date of Issue

Conference Information
Committee	IBISML
Conference Date	2012/10/31(1days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To	Information-Based Induction Sciences and Machine Learning (IBISML)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Robustness of time-consistent Markov decision processes
Sub Title (in English)
Keyword(1)	Markov decision process
Keyword(2)	Time-consistency
Keyword(3)	Iterated risk measure
Keyword(4)	Expected utility
Keyword(5)	Dynamic programming
Keyword(6)	Risk
Keyword(7)	Robust
1st Author's Name	Takayuki OSOGAMI
1st Author's Affiliation	IBM Research-Tokyo()
Date	2012-11-07
Paper #	IBISML2012-40
Volume (vol)	vol.112
Number (no)	279
Page	pp.pp.-
#Pages	8
Date of Issue