Paper Abstract and Keywords |
Presentation |
2020-12-02 10:00
Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process Naoki Ichijo, Yuta Nakahara, Yuto Motomura, Toshiyasu Matsushima (Waseda Univ.) IT2020-31 |
Abstract |
(in Japanese) |
(See Japanese page) |
(in English) |
In Markov decision process(MDP) problems with an unknown transition probability, a learning agent has to learn the unknown information and get rewards at the same time. One way to deal with this difficulty is to separate the period of the MDP into two parts; the learning period and the earning period. In this paper, we consider a decision problem of sampling actions to learn the unknown information in the learning period of the divided MDP. Our purpose is formulated as a maximization of the total discounted reward based on the Bayes decision theory. We derive a theoretical solution for it. However, its computational complexity is on the exponential order with respect to the length of the learning period. Therefore, we propose two approximation algorithms to reduce computational complexity. |
Keyword |
(in Japanese) |
(See Japanese page) |
(in English) |
Markov decision process / Dynamic Programming / Policy iteration / Reinforcement learning / Bayes decision theory / / / |
Reference Info. |
IEICE Tech. Rep., vol. 120, no. 268, IT2020-31, pp. 38-43, Dec. 2020. |
Paper # |
IT2020-31 |
Date of Issue |
2020-11-24 (IT) |
ISSN |
Online edition: ISSN 2432-6380 |
Copyright and reproduction |
All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
Download PDF |
IT2020-31 |
Conference Information |
Committee |
IT |
Conference Date |
2020-12-01 - 2020-12-03 |
Place (in Japanese) |
(See Japanese page) |
Place (in English) |
Online |
Topics (in Japanese) |
(See Japanese page) |
Topics (in English) |
Lectures for Young Researchers, General |
Paper Information |
Registration To |
IT |
Conference Code |
2020-12-IT |
Language |
Japanese |
Title (in Japanese) |
(See Japanese page) |
Sub Title (in Japanese) |
(See Japanese page) |
Title (in English) |
Policy Optimization Based on Bayesian Decision Theory in Learning Period on Markov Decision Process |
Sub Title (in English) |
|
Keyword(1) |
Markov decision process |
Keyword(2) |
Dynamic Programming |
Keyword(3) |
Policy iteration |
Keyword(4) |
Reinforcement learning |
Keyword(5) |
Bayes decision theory |
Keyword(6) |
|
Keyword(7) |
|
Keyword(8) |
|
1st Author's Name |
Naoki Ichijo |
1st Author's Affiliation |
Waseda University (Waseda Univ.) |
2nd Author's Name |
Yuta Nakahara |
2nd Author's Affiliation |
Waseda University (Waseda Univ.) |
3rd Author's Name |
Yuto Motomura |
3rd Author's Affiliation |
Waseda University (Waseda Univ.) |
4th Author's Name |
Toshiyasu Matsushima |
4th Author's Affiliation |
Waseda University (Waseda Univ.) |
5th Author's Name |
|
5th Author's Affiliation |
() |
6th Author's Name |
|
6th Author's Affiliation |
() |
7th Author's Name |
|
7th Author's Affiliation |
() |
8th Author's Name |
|
8th Author's Affiliation |
() |
9th Author's Name |
|
9th Author's Affiliation |
() |
10th Author's Name |
|
10th Author's Affiliation |
() |
11th Author's Name |
|
11th Author's Affiliation |
() |
12th Author's Name |
|
12th Author's Affiliation |
() |
13th Author's Name |
|
13th Author's Affiliation |
() |
14th Author's Name |
|
14th Author's Affiliation |
() |
15th Author's Name |
|
15th Author's Affiliation |
() |
16th Author's Name |
|
16th Author's Affiliation |
() |
17th Author's Name |
|
17th Author's Affiliation |
() |
18th Author's Name |
|
18th Author's Affiliation |
() |
19th Author's Name |
|
19th Author's Affiliation |
() |
20th Author's Name |
|
20th Author's Affiliation |
() |
Speaker |
Author-1 |
Date Time |
2020-12-02 10:00:00 |
Presentation Time |
20 minutes |
Registration for |
IT |
Paper # |
IT2020-31 |
Volume (vol) |
vol.120 |
Number (no) |
no.268 |
Page |
pp.38-43 |
#Pages |
6 |
Date of Issue |
2020-11-24 (IT) |
|