Presentation | 2020-12-02 Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process Naoki Ichijo, Yuta Nakahara, Yuto Motomura, Toshiyasu Matsushima, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | In Markov decision process(MDP) problems with an unknown transition probability, a learning agent has to learn the unknown information and get rewards at the same time. One way to deal with this difficulty is to separate the period of the MDP into two parts; the learning period and the earning period. In this paper, we consider a decision problem of sampling actions to learn the unknown information in the learning period of the divided MDP. Our purpose is formulated as a maximization of the total discounted reward based on the Bayes decision theory. We derive a theoretical solution for it. However, its computational complexity is on the exponential order with respect to the length of the learning period. Therefore, we propose two approximation algorithms to reduce computational complexity. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Markov decision process / Dynamic Programming / Policy iteration / Reinforcement learning / Bayes decision theory |
Paper # | IT2020-31 |
Date of Issue | 2020-11-24 (IT) |
Conference Information | |
Committee | IT |
---|---|
Conference Date | 2020/12/1(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Online |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Lectures for Young Researchers, General |
Chair | Tadashi Wadayama(Nagoya Inst. of Tech.) |
Vice Chair | Tetsuya Kojima(Tokyo Kosen) |
Secretary | Tetsuya Kojima(Yamaguchi Univ.) |
Assistant | Takahiro Ohta(Senshu Univ.) |
Paper Information | |
Registration To | Technical Committee on Information Theory |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process |
Sub Title (in English) | |
Keyword(1) | Markov decision process |
Keyword(2) | Dynamic Programming |
Keyword(3) | Policy iteration |
Keyword(4) | Reinforcement learning |
Keyword(5) | Bayes decision theory |
1st Author's Name | Naoki Ichijo |
1st Author's Affiliation | Waseda University(Waseda Univ.) |
2nd Author's Name | Yuta Nakahara |
2nd Author's Affiliation | Waseda University(Waseda Univ.) |
3rd Author's Name | Yuto Motomura |
3rd Author's Affiliation | Waseda University(Waseda Univ.) |
4th Author's Name | Toshiyasu Matsushima |
4th Author's Affiliation | Waseda University(Waseda Univ.) |
Date | 2020-12-02 |
Paper # | IT2020-31 |
Volume (vol) | vol.120 |
Number (no) | IT-268 |
Page | pp.pp.38-43(IT), |
#Pages | 6 |
Date of Issue | 2020-11-24 (IT) |