学習途上エージェントの挙動に基づく逆強化学習

櫻井 俊輔; 大羽 成征; 石井 信

Presentation	2015-06-23 Inverse reinforcemnet learing based on behaviors of a learning agent Shunsuke Sakurai, Shigeyuki Oba, Shin Ishii,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	An appropriate design of reward function is important for reinforcement learning to efficiently obtain an optimal policy to achieve an intended goal because different reward functions for the same goal can cause different convergence speed of learning. However, there is no systematic way to determine a good reward function for any environments. How can we imitate a training strategy of a reference agent who efficiently adapts occasional changes of environment?In this study, we extend the apprenticeship learning framework to accept state-action-history data of developing agent whose policy is not optimal but changing toward optimal. By this extension, reward function is estimated by inverse reinforcement learning using the estimated change of policy of a developing reference agent, and the objective agent can imitate the policy learning process of the reference agent using the estimated reward function. We applied the proposed method to estimate reward function of a developing agent that trained at a simple 2-state Markov decision process (MDP) and showed that the process to determining optimal policy is imitated by the reward that was estimated by the proposed method.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	reinforcement learning / inverse reinforcement learning / apprenticeship learning / learning process
Paper #	IBISML2015-15
Date of Issue	2015-06-16 (IBISML)

Conference Information
Committee	NC / IPSJ-BIO / IBISML / IPSJ-MPS
Conference Date	2015/6/23(3days)
Place (in Japanese)	(See Japanese page)
Place (in English)	Okinawa Institute of Science and Technology
Topics (in Japanese)	(See Japanese page)
Topics (in English)	Machine Learning Approach to Biodata Mining, and General
Chair	Toshimichi Saito(Hosei Univ.) / Masakazu Sekijima(東工大) / Takashi Washio(Osaka Univ.) / Hayaru Shouno(電通大)
Vice Chair	Shigeo Sato(Tohoku Univ.) / / Kenji Fukumizu(ISM) / Masashi Sugiyama(Tokyo Inst. of Tech.)
Secretary	Shigeo Sato(Kyushu Inst. of Tech.) / (Kyoto Sangyo Univ.) / Kenji Fukumizu(京大) / Masashi Sugiyama(お茶の水女子大) / (OIST)
Assistant	Hiroyuki Kanbara(Tokyo Inst. of Tech.) / Hisanao Akima(Tohoku Univ.) / / Koji Tsuda(Univ. of Tokyo) / Hisashi Kashima(Kyoto Univ.)

Paper Information
Registration To	Technical Committee on Neurocomputing / Special Interest Group on Bioinformatics and Genomics / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Mathematical Modeling and Problem Solving
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Inverse reinforcemnet learing based on behaviors of a learning agent
Sub Title (in English)
Keyword(1)	reinforcement learning
Keyword(2)	inverse reinforcement learning
Keyword(3)	apprenticeship learning
Keyword(4)	learning process
1st Author's Name	Shunsuke Sakurai
1st Author's Affiliation	Kyoto University(Kyoto Univ.)
2nd Author's Name	Shigeyuki Oba
2nd Author's Affiliation	Kyoto University(Kyoto Univ.)
3rd Author's Name	Shin Ishii
3rd Author's Affiliation	Kyoto University(Kyoto Univ.)
Date	2015-06-23
Paper #	IBISML2015-15
Volume (vol)	vol.115
Number (no)	IBISML-112
Page	pp.pp.95-99(IBISML),
#Pages	5
Date of Issue	2015-06-16 (IBISML)