マルコフ決定過程問題における学習期間の行動のベイズ決定理論に基づく最適化

Presentation	2020-12-02 Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process Naoki Ichijo, Yuta Nakahara, Yuto Motomura, Toshiyasu Matsushima,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In Markov decision process(MDP) problems with an unknown transition probability, a learning agent has to learn the unknown information and get rewards at the same time. One way to deal with this difficulty is to separate the period of the MDP into two parts; the learning period and the earning period. In this paper, we consider a decision problem of sampling actions to learn the unknown information in the learning period of the divided MDP. Our purpose is formulated as a maximization of the total discounted reward based on the Bayes decision theory. We derive a theoretical solution for it. However, its computational complexity is on the exponential order with respect to the length of the learning period. Therefore, we propose two approximation algorithms to reduce computational complexity.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Markov decision process / Dynamic Programming / Policy iteration / Reinforcement learning / Bayes decision theory
Paper #	IT2020-31
Date of Issue	2020-11-24 (IT)

Conference Information
Committee	IT
Conference Date	2020/12/1(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Online
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Lectures for Young Researchers, General
Chair	Tadashi Wadayama(Nagoya Inst. of Tech.)
Vice Chair	Tetsuya Kojima(Tokyo Kosen)
Secretary	Tetsuya Kojima(Yamaguchi Univ.)
Assistant	Takahiro Ohta(Senshu Univ.)

Paper Information
Registration To	Technical Committee on Information Theory
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process
Sub Title (in English)
Keyword(1)	Markov decision process
Keyword(2)	Dynamic Programming
Keyword(3)	Policy iteration
Keyword(4)	Reinforcement learning
Keyword(5)	Bayes decision theory
1st Author's Name	Naoki Ichijo
1st Author's Affiliation	Waseda University(Waseda Univ.)
2nd Author's Name	Yuta Nakahara
2nd Author's Affiliation	Waseda University(Waseda Univ.)
3rd Author's Name	Yuto Motomura
3rd Author's Affiliation	Waseda University(Waseda Univ.)
4th Author's Name	Toshiyasu Matsushima
4th Author's Affiliation	Waseda University(Waseda Univ.)
Date	2020-12-02
Paper #	IT2020-31
Volume (vol)	vol.120
Number (no)	IT-268
Page	pp.pp.38-43(IT),
#Pages	6
Date of Issue	2020-11-24 (IT)