Presentation | 2015-06-23 Inverse reinforcemnet learing based on behaviors of a learning agent Shunsuke Sakurai, Shigeyuki Oba, Shin Ishii, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | An appropriate design of reward function is important for reinforcement learning to efficiently obtain an optimal policy to achieve an intended goal because different reward functions for the same goal can cause different convergence speed of learning. However, there is no systematic way to determine a good reward function for any environments. How can we imitate a training strategy of a reference agent who efficiently adapts occasional changes of environment?In this study, we extend the apprenticeship learning framework to accept state-action-history data of developing agent whose policy is not optimal but changing toward optimal. By this extension, reward function is estimated by inverse reinforcement learning using the estimated change of policy of a developing reference agent, and the objective agent can imitate the policy learning process of the reference agent using the estimated reward function. We applied the proposed method to estimate reward function of a developing agent that trained at a simple 2-state Markov decision process (MDP) and showed that the process to determining optimal policy is imitated by the reward that was estimated by the proposed method. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | reinforcement learning / inverse reinforcement learning / apprenticeship learning / learning process |
Paper # | IBISML2015-15 |
Date of Issue | 2015-06-16 (IBISML) |
Conference Information | |
Committee | NC / IPSJ-BIO / IBISML / IPSJ-MPS |
---|---|
Conference Date | 2015/6/23(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Okinawa Institute of Science and Technology |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Machine Learning Approach to Biodata Mining, and General |
Chair | Toshimichi Saito(Hosei Univ.) / Masakazu Sekijima(東工大) / Takashi Washio(Osaka Univ.) / Hayaru Shouno(電通大) |
Vice Chair | Shigeo Sato(Tohoku Univ.) / / Kenji Fukumizu(ISM) / Masashi Sugiyama(Tokyo Inst. of Tech.) |
Secretary | Shigeo Sato(Kyushu Inst. of Tech.) / (Kyoto Sangyo Univ.) / Kenji Fukumizu(京大) / Masashi Sugiyama(お茶の水女子大) / (OIST) |
Assistant | Hiroyuki Kanbara(Tokyo Inst. of Tech.) / Hisanao Akima(Tohoku Univ.) / / Koji Tsuda(Univ. of Tokyo) / Hisashi Kashima(Kyoto Univ.) |
Paper Information | |
Registration To | Technical Committee on Neurocomputing / Special Interest Group on Bioinformatics and Genomics / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Mathematical Modeling and Problem Solving |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Inverse reinforcemnet learing based on behaviors of a learning agent |
Sub Title (in English) | |
Keyword(1) | reinforcement learning |
Keyword(2) | inverse reinforcement learning |
Keyword(3) | apprenticeship learning |
Keyword(4) | learning process |
1st Author's Name | Shunsuke Sakurai |
1st Author's Affiliation | Kyoto University(Kyoto Univ.) |
2nd Author's Name | Shigeyuki Oba |
2nd Author's Affiliation | Kyoto University(Kyoto Univ.) |
3rd Author's Name | Shin Ishii |
3rd Author's Affiliation | Kyoto University(Kyoto Univ.) |
Date | 2015-06-23 |
Paper # | IBISML2015-15 |
Volume (vol) | vol.115 |
Number (no) | IBISML-112 |
Page | pp.pp.95-99(IBISML), |
#Pages | 5 |
Date of Issue | 2015-06-16 (IBISML) |