内部状態の報酬に基づいた推定を行う強化学習法

Presentation	2004/6/18 A reinforcement learning for a policy involving value-directed internal state Yutaka NAKAMURA, Shin ISHII,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	There are many studies on partially observable Markov decision processes, which employ "belief state" that represents the state of the environment, in order to estimate the value function. However, it is often intractable to obtain the value function because the space of belief states is very large. Recently, policy gradient methods that involve value learning have been developed and their efficiency has been shown. In this report, we propose a natural policy gradient method for a policy involving an internal state. Computer simulations show that a good controller which can control a linear dynamical system with unobservable variables is acquired according to our reinforcement learning method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Reinforcement learning / policy gradient method / natural policy gradient method / partially observable Markov decision process / least squares temporal difference learning
Paper #	NC2004-33
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	A reinforcement learning for a policy involving value-directed internal state
Sub Title (in English)
Keyword(1)	Reinforcement learning
Keyword(2)	policy gradient method
Keyword(3)	natural policy gradient method
Keyword(4)	partially observable Markov decision process
Keyword(5)	least squares temporal difference learning
1st Author's Name	Yutaka NAKAMURA
1st Author's Affiliation	Nara Institute of Science and Technology, Theoretical Life-Science Laboratory:CREST, Japan Science and Technology Agency()
2nd Author's Name	Shin ISHII
2nd Author's Affiliation	Nara Institute of Science and Technology, Theoretical Life-Science Laboratory:CREST, Japan Science and Technology Agency
Date	2004/6/18
Paper #	NC2004-33
Volume (vol)	vol.104
Number (no)	140
Page	pp.pp.-
#Pages	6
Date of Issue