Presentation 2020-12-02
Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process
Naoki Ichijo, Yuta Nakahara, Yuto Motomura, Toshiyasu Matsushima,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In Markov decision process(MDP) problems with an unknown transition probability, a learning agent has to learn the unknown information and get rewards at the same time. One way to deal with this difficulty is to separate the period of the MDP into two parts; the learning period and the earning period. In this paper, we consider a decision problem of sampling actions to learn the unknown information in the learning period of the divided MDP. Our purpose is formulated as a maximization of the total discounted reward based on the Bayes decision theory. We derive a theoretical solution for it. However, its computational complexity is on the exponential order with respect to the length of the learning period. Therefore, we propose two approximation algorithms to reduce computational complexity.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Markov decision process / Dynamic Programming / Policy iteration / Reinforcement learning / Bayes decision theory
Paper # IT2020-31
Date of Issue 2020-11-24 (IT)

Conference Information
Committee IT
Conference Date 2020/12/1(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Online
Topics (in Japanese) (See Japanese page)
Topics (in English) Lectures for Young Researchers, General
Chair Tadashi Wadayama(Nagoya Inst. of Tech.)
Vice Chair Tetsuya Kojima(Tokyo Kosen)
Secretary Tetsuya Kojima(Yamaguchi Univ.)
Assistant Takahiro Ohta(Senshu Univ.)

Paper Information
Registration To Technical Committee on Information Theory
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process
Sub Title (in English)
Keyword(1) Markov decision process
Keyword(2) Dynamic Programming
Keyword(3) Policy iteration
Keyword(4) Reinforcement learning
Keyword(5) Bayes decision theory
1st Author's Name Naoki Ichijo
1st Author's Affiliation Waseda University(Waseda Univ.)
2nd Author's Name Yuta Nakahara
2nd Author's Affiliation Waseda University(Waseda Univ.)
3rd Author's Name Yuto Motomura
3rd Author's Affiliation Waseda University(Waseda Univ.)
4th Author's Name Toshiyasu Matsushima
4th Author's Affiliation Waseda University(Waseda Univ.)
Date 2020-12-02
Paper # IT2020-31
Volume (vol) vol.120
Number (no) IT-268
Page pp.pp.38-43(IT),
#Pages 6
Date of Issue 2020-11-24 (IT)