エピソードタスクにおける方策オフ型LSTD(λ)法とその収束性(機械学習によるバイオデータマインニング,一般)

Presentation	2007-06-15 Of policy least-squares temporal difference learning and its convergence guarantee in finite horizon problems Takeshi MORI, Shin-ichi MAEDA, Shin ISHII,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Recently-developed off-policy temporal difference (TD) learning with linear function approximation has attracted attention because of the possibility of sample reuse and dealing effectively with exploration and exploitation. However, the variance of the value function becomes exponentially large as the length of trajectory grows and hence the learning diverges. It is then necessary to truncate the length of trajectory, but the bias of such a finite horizon trajectory can be so harmful that the value function also diverges. Therefore, both in such infinite and finite horizon problems, the off-policy TD learning has no convergence guarantee. In this study, we propose an off-policy least-squares temporal difference (LSTD) learning- and show the convergence in finite horizon problems. Computer simulation shows that our method converges in a finite horizon problem whereas the off-policy TD learning diverges.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	reinforcement learning / off policy learning / importance sampling / least-squares temporal difference learning / finite horizon problem
Paper #	NC2007-14
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Of policy least-squares temporal difference learning and its convergence guarantee in finite horizon problems
Sub Title (in English)
Keyword(1)	reinforcement learning
Keyword(2)	off policy learning
Keyword(3)	importance sampling
Keyword(4)	least-squares temporal difference learning
Keyword(5)	finite horizon problem
1st Author's Name	Takeshi MORI
1st Author's Affiliation	Nara Institute of Science and Technology()
2nd Author's Name	Shin-ichi MAEDA
2nd Author's Affiliation	Nara Institute of Science and Technology
3rd Author's Name	Shin ISHII
3rd Author's Affiliation	Nara Institute of Science and Technology
Date	2007-06-15
Paper #	NC2007-14
Volume (vol)	vol.107
Number (no)	92
Page	pp.pp.-
#Pages	6
Date of Issue