行動履歴データによる行動方策学習者の探索戦略変化の推定

内田 滋穂里; 大羽 成征; 石井 信

Presentation	2017-03-13 Estimation of the change of agent's behavior strategy using state-action history Shihori Uchida, Shigeyuki Oba, Shin Ishii,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Reinforcement learning (RL) is a model of learning process of animals and intelligent agents to obtain the optimal behavioral policy based on interactions with unknown environments. Inverse reinforcement learning (IRL) is its opposite, in which the characteristics like reward function of the RL agent are estimated based on the history of the agent's behaviors. In the uncertain environment, the RL agent needs to balance between the currently good behavioral policy (exploitation) and an exploration policy for resolving the uncertainty of the environment (exploration). The existing IRL methods were not appropriate to identify the RL agent's characteristics when it is taking a mixed strategy performing exploitation and exploration depending on its situation. In this study, we proposed a new IRL method that enabled dissociation of different behavioral policies but with the common reward function. Our computer simulation showed that, our method successfully identifies not only the timing of the policy change, but also the other RL parameters like behavioral randomness and the common reward function, only from the agent's behaviors.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Reinforcement learning / Inverse reinforcement learning / Behavior strategy
Paper #	NC2016-65
Date of Issue	2017-03-06 (NC)

Conference Information
Committee	MBE / NC
Conference Date	2017/3/13(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Kikai-Shinko-Kaikan Bldg.
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Yutaka Fukuoka(Kogakuin Univ.) / Shigeo Sato(Tohoku Univ.)
Vice Chair	Kazuki Nakajima(Univ. of Toyama) / Masafumi Hagiwara(Keio Univ.)
Secretary	Kazuki Nakajima(Kogakuin Univ.) / Masafumi Hagiwara(Toyama Pref. Univ.)
Assistant	Ryota Horie(Shibaura Inst. of Tech.) / Kim Juhyon(Univ. of Toyama) / Hisanao Akima(Tohoku Univ.) / Yoshihisa Shinozawa(Keio Univ.)

Paper Information
Registration To	Technical Committee on ME and Bio Cybernetics / Technical Committee on Neurocomputing
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Estimation of the change of agent's behavior strategy using state-action history
Sub Title (in English)
Keyword(1)	Reinforcement learning
Keyword(2)	Inverse reinforcement learning
Keyword(3)	Behavior strategy
1st Author's Name	Shihori Uchida
1st Author's Affiliation	Kyoto University(Kyoto Univ.)
2nd Author's Name	Shigeyuki Oba
2nd Author's Affiliation	Kyoto University(Kyoto Univ.)
3rd Author's Name	Shin Ishii
3rd Author's Affiliation	Kyoto University(Kyoto Univ.)
Date	2017-03-13
Paper #	NC2016-65
Volume (vol)	vol.116
Number (no)	NC-521
Page	pp.pp.7-12(NC),
#Pages	6
Date of Issue	2017-03-06 (NC)