Presentation | 2007-06-15 Of policy least-squares temporal difference learning and its convergence guarantee in finite horizon problems Takeshi MORI, Shin-ichi MAEDA, Shin ISHII, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Recently-developed off-policy temporal difference (TD) learning with linear function approximation has attracted attention because of the possibility of sample reuse and dealing effectively with exploration and exploitation. However, the variance of the value function becomes exponentially large as the length of trajectory grows and hence the learning diverges. It is then necessary to truncate the length of trajectory, but the bias of such a finite horizon trajectory can be so harmful that the value function also diverges. Therefore, both in such infinite and finite horizon problems, the off-policy TD learning has no convergence guarantee. In this study, we propose an off-policy least-squares temporal difference (LSTD) learning- and show the convergence in finite horizon problems. Computer simulation shows that our method converges in a finite horizon problem whereas the off-policy TD learning diverges. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | reinforcement learning / off policy learning / importance sampling / least-squares temporal difference learning / finite horizon problem |
Paper # | NC2007-14 |
Date of Issue |
Conference Information | |
Committee | NC |
---|---|
Conference Date | 2007/6/7(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Neurocomputing (NC) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Of policy least-squares temporal difference learning and its convergence guarantee in finite horizon problems |
Sub Title (in English) | |
Keyword(1) | reinforcement learning |
Keyword(2) | off policy learning |
Keyword(3) | importance sampling |
Keyword(4) | least-squares temporal difference learning |
Keyword(5) | finite horizon problem |
1st Author's Name | Takeshi MORI |
1st Author's Affiliation | Nara Institute of Science and Technology() |
2nd Author's Name | Shin-ichi MAEDA |
2nd Author's Affiliation | Nara Institute of Science and Technology |
3rd Author's Name | Shin ISHII |
3rd Author's Affiliation | Nara Institute of Science and Technology |
Date | 2007-06-15 |
Paper # | NC2007-14 |
Volume (vol) | vol.107 |
Number (no) | 92 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |