講演抄録/キーワード |
講演名 |
2009-03-12 15:40
Statistical Active Learning for Efficient Value Function Approximation in Reinforcement Learning ○Takayuki Akiyama・Hirotaka Hachiya・Masashi Sugiyama(Tokyo Inst. of Tech.) NC2008-147 |
抄録 |
(和) |
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The proposed method combined with LSPI is called active policy iteration (API). Through simulations with a batting robot, we demonstrate the usefulness of API. |
(英) |
Appropriately designing sampling policies is highly important for obtaining better control policies in reinforcement learning. In this paper, we first show that the least-squares policy iteration (LSPI) framework allows us to employ statistical active learning methods for linear regression. Then we propose a design method of good sampling policies for efficient exploration, which is particularly useful when the sampling cost of immediate rewards is high. The proposed method combined with LSPI is called active policy iteration (API). Through simulations with a batting robot, we demonstrate the usefulness of API. |
キーワード |
(和) |
/ / / / / / / |
(英) |
Reinforcement Learning / Active Learning / Robotics / / / / / |
文献情報 |
信学技報, vol. 108, no. 480, NC2008-147, pp. 261-266, 2009年3月. |
資料番号 |
NC2008-147 |
発行日 |
2009-03-04 (NC) |
ISSN |
Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380 |
著作権に ついて |
技術研究報告に掲載された論文の著作権は電子情報通信学会に帰属します.(許諾番号:10GA0019/12GB0052/13GB0056/17GB0034/18GB0034) |
PDFダウンロード |
NC2008-147 |