Presentation 2007-12-22
Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation
Hirotaka HACHIYA, Takayuki AKIYAMA, Masashi SUGIYAMA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Off-policy reinforcement learning is aimed at efficiently reusing data samples gathered in the past. A common approach is to use importance sampling techniques for compensating for the bias caused by the difference between data-collecting policies and the target policy. However, existing off-policy methods do not often take the variance of value function estimators explicitly into account and therefore their performance tends to be unstable. To cope with this problem, we propose using an adaptive importance sampling technique which allows us to actively control the trade-off between bias and variance. We further provide a method for optimally determining the trade-off parameter based on a statistical machine learning theory.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) Off-policy Reinforcement learning / Value function approximation / Importance sampling
Paper # NC2007-84
Date of Issue

Conference Information
Committee NC
Conference Date 2007/12/15(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Neurocomputing (NC)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Adaptive Importance Sampling with Automatic Model Selection in Value Function Approximation
Sub Title (in English)
Keyword(1) Off-policy Reinforcement learning
Keyword(2) Value function approximation
Keyword(3) Importance sampling
1st Author's Name Hirotaka HACHIYA
1st Author's Affiliation Department of Computer Science, Tokyo Institute of Technology()
2nd Author's Name Takayuki AKIYAMA
2nd Author's Affiliation Department of Computer Science, Tokyo Institute of Technology
3rd Author's Name Masashi SUGIYAMA
3rd Author's Affiliation Department of Computer Science, Tokyo Institute of Technology
Date 2007-12-22
Paper # NC2007-84
Volume (vol) vol.107
Number (no) 410
Page pp.pp.-
#Pages 6
Date of Issue