Presentation | 2017-03-13 Estimation of the change of agent's behavior strategy using state-action history Shihori Uchida, Shigeyuki Oba, Shin Ishii, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Reinforcement learning (RL) is a model of learning process of animals and intelligent agents to obtain the optimal behavioral policy based on interactions with unknown environments. Inverse reinforcement learning (IRL) is its opposite, in which the characteristics like reward function of the RL agent are estimated based on the history of the agent's behaviors. In the uncertain environment, the RL agent needs to balance between the currently good behavioral policy (exploitation) and an exploration policy for resolving the uncertainty of the environment (exploration). The existing IRL methods were not appropriate to identify the RL agent's characteristics when it is taking a mixed strategy performing exploitation and exploration depending on its situation. In this study, we proposed a new IRL method that enabled dissociation of different behavioral policies but with the common reward function. Our computer simulation showed that, our method successfully identifies not only the timing of the policy change, but also the other RL parameters like behavioral randomness and the common reward function, only from the agent's behaviors. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Reinforcement learning / Inverse reinforcement learning / Behavior strategy |
Paper # | NC2016-65 |
Date of Issue | 2017-03-06 (NC) |
Conference Information | |
Committee | MBE / NC |
---|---|
Conference Date | 2017/3/13(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Kikai-Shinko-Kaikan Bldg. |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Yutaka Fukuoka(Kogakuin Univ.) / Shigeo Sato(Tohoku Univ.) |
Vice Chair | Kazuki Nakajima(Univ. of Toyama) / Masafumi Hagiwara(Keio Univ.) |
Secretary | Kazuki Nakajima(Kogakuin Univ.) / Masafumi Hagiwara(Toyama Pref. Univ.) |
Assistant | Ryota Horie(Shibaura Inst. of Tech.) / Kim Juhyon(Univ. of Toyama) / Hisanao Akima(Tohoku Univ.) / Yoshihisa Shinozawa(Keio Univ.) |
Paper Information | |
Registration To | Technical Committee on ME and Bio Cybernetics / Technical Committee on Neurocomputing |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Estimation of the change of agent's behavior strategy using state-action history |
Sub Title (in English) | |
Keyword(1) | Reinforcement learning |
Keyword(2) | Inverse reinforcement learning |
Keyword(3) | Behavior strategy |
1st Author's Name | Shihori Uchida |
1st Author's Affiliation | Kyoto University(Kyoto Univ.) |
2nd Author's Name | Shigeyuki Oba |
2nd Author's Affiliation | Kyoto University(Kyoto Univ.) |
3rd Author's Name | Shin Ishii |
3rd Author's Affiliation | Kyoto University(Kyoto Univ.) |
Date | 2017-03-13 |
Paper # | NC2016-65 |
Volume (vol) | vol.116 |
Number (no) | NC-521 |
Page | pp.pp.7-12(NC), |
#Pages | 6 |
Date of Issue | 2017-03-06 (NC) |