Presentation | 2004/6/18 A reinforcement learning for a policy involving value-directed internal state Yutaka NAKAMURA, Shin ISHII, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | There are many studies on partially observable Markov decision processes, which employ "belief state" that represents the state of the environment, in order to estimate the value function. However, it is often intractable to obtain the value function because the space of belief states is very large. Recently, policy gradient methods that involve value learning have been developed and their efficiency has been shown. In this report, we propose a natural policy gradient method for a policy involving an internal state. Computer simulations show that a good controller which can control a linear dynamical system with unobservable variables is acquired according to our reinforcement learning method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Reinforcement learning / policy gradient method / natural policy gradient method / partially observable Markov decision process / least squares temporal difference learning |
Paper # | NC2004-33 |
Date of Issue |
Conference Information | |
Committee | NC |
---|---|
Conference Date | 2004/6/18(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Neurocomputing (NC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | A reinforcement learning for a policy involving value-directed internal state |
Sub Title (in English) | |
Keyword(1) | Reinforcement learning |
Keyword(2) | policy gradient method |
Keyword(3) | natural policy gradient method |
Keyword(4) | partially observable Markov decision process |
Keyword(5) | least squares temporal difference learning |
1st Author's Name | Yutaka NAKAMURA |
1st Author's Affiliation | Nara Institute of Science and Technology, Theoretical Life-Science Laboratory:CREST, Japan Science and Technology Agency() |
2nd Author's Name | Shin ISHII |
2nd Author's Affiliation | Nara Institute of Science and Technology, Theoretical Life-Science Laboratory:CREST, Japan Science and Technology Agency |
Date | 2004/6/18 |
Paper # | NC2004-33 |
Volume (vol) | vol.104 |
Number (no) | 140 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |