Presentation 2007-06-15
Of policy least-squares temporal difference learning and its convergence guarantee in finite horizon problems
Takeshi MORI, Shin-ichi MAEDA, Shin ISHII,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Recently-developed off-policy temporal difference (TD) learning with linear function approximation has attracted attention because of the possibility of sample reuse and dealing effectively with exploration and exploitation. However, the variance of the value function becomes exponentially large as the length of trajectory grows and hence the learning diverges. It is then necessary to truncate the length of trajectory, but the bias of such a finite horizon trajectory can be so harmful that the value function also diverges. Therefore, both in such infinite and finite horizon problems, the off-policy TD learning has no convergence guarantee. In this study, we propose an off-policy least-squares temporal difference (LSTD) learning- and show the convergence in finite horizon problems. Computer simulation shows that our method converges in a finite horizon problem whereas the off-policy TD learning diverges.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) reinforcement learning / off policy learning / importance sampling / least-squares temporal difference learning / finite horizon problem
Paper # NC2007-14
Date of Issue

Conference Information
Committee NC
Conference Date 2007/6/7(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Of policy least-squares temporal difference learning and its convergence guarantee in finite horizon problems
Sub Title (in English)
Keyword(1) reinforcement learning
Keyword(2) off policy learning
Keyword(3) importance sampling
Keyword(4) least-squares temporal difference learning
Keyword(5) finite horizon problem
1st Author's Name Takeshi MORI
1st Author's Affiliation Nara Institute of Science and Technology()
2nd Author's Name Shin-ichi MAEDA
2nd Author's Affiliation Nara Institute of Science and Technology
3rd Author's Name Shin ISHII
3rd Author's Affiliation Nara Institute of Science and Technology
Date 2007-06-15
Paper # NC2007-14
Volume (vol) vol.107
Number (no) 92
Page pp.pp.-
#Pages 6
Date of Issue