Presentation | 2014-11-17 An Online Policy Gradient Algorithm for Continuous State and Action Markov Decision Processes with Bandit Feedback Yao MA, Masashi SUGIYAMA, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | We consider the learning problem under an online Markov decision process (MDP), which is aimed at learning the time-dependent decision-making policy of an agent that minimizes the regret - the difference from the best fixed policy. The difficulty of online MDP learning is that the reward function changes over time. In this paper, we show that a simple online policy gradient algorithm achieves regret O(√ |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Markov decision process / online learning / reinforcement learning |
Paper # | IBISML2014-53 |
Date of Issue |
Conference Information | |
Committee | IBISML |
---|---|
Conference Date | 2014/11/10(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Information-Based Induction Sciences and Machine Learning (IBISML) |
---|---|
Language | ENG |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | An Online Policy Gradient Algorithm for Continuous State and Action Markov Decision Processes with Bandit Feedback |
Sub Title (in English) | |
Keyword(1) | Markov decision process |
Keyword(2) | online learning |
Keyword(3) | reinforcement learning |
1st Author's Name | Yao MA |
1st Author's Affiliation | Department of Computer Science, Tokyo Institute of Technology() |
2nd Author's Name | Masashi SUGIYAMA |
2nd Author's Affiliation | Department of Complexity Science and Engineering, University of Tokyo |
Date | 2014-11-17 |
Paper # | IBISML2014-53 |
Volume (vol) | vol.114 |
Number (no) | 306 |
Page | pp.pp.- |
#Pages | 8 |
Date of Issue |