Presentation 2015-06-23
Inverse reinforcemnet learing based on behaviors of a learning agent
Shunsuke Sakurai, Shigeyuki Oba, Shin Ishii,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) An appropriate design of reward function is important for reinforcement learning to efficiently obtain an optimal policy to achieve an intended goal because different reward functions for the same goal can cause different convergence speed of learning. However, there is no systematic way to determine a good reward function for any environments. How can we imitate a training strategy of a reference agent who efficiently adapts occasional changes of environment?In this study, we extend the apprenticeship learning framework to accept state-action-history data of developing agent whose policy is not optimal but changing toward optimal. By this extension, reward function is estimated by inverse reinforcement learning using the estimated change of policy of a developing reference agent, and the objective agent can imitate the policy learning process of the reference agent using the estimated reward function. We applied the proposed method to estimate reward function of a developing agent that trained at a simple 2-state Markov decision process (MDP) and showed that the process to determining optimal policy is imitated by the reward that was estimated by the proposed method.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) reinforcement learning / inverse reinforcement learning / apprenticeship learning / learning process
Paper # IBISML2015-15
Date of Issue 2015-06-16 (IBISML)

Conference Information
Committee NC / IPSJ-BIO / IBISML / IPSJ-MPS
Conference Date 2015/6/23(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Okinawa Institute of Science and Technology
Topics (in Japanese) (See Japanese page)
Topics (in English) Machine Learning Approach to Biodata Mining, and General
Chair Toshimichi Saito(Hosei Univ.) / Masakazu Sekijima(東工大) / Takashi Washio(Osaka Univ.) / Hayaru Shouno(電通大)
Vice Chair Shigeo Sato(Tohoku Univ.) / / Kenji Fukumizu(ISM) / Masashi Sugiyama(Tokyo Inst. of Tech.)
Secretary Shigeo Sato(Kyushu Inst. of Tech.) / (Kyoto Sangyo Univ.) / Kenji Fukumizu(京大) / Masashi Sugiyama(お茶の水女子大) / (OIST)
Assistant Hiroyuki Kanbara(Tokyo Inst. of Tech.) / Hisanao Akima(Tohoku Univ.) / / Koji Tsuda(Univ. of Tokyo) / Hisashi Kashima(Kyoto Univ.)

Paper Information
Registration To Technical Committee on Neurocomputing / Special Interest Group on Bioinformatics and Genomics / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Mathematical Modeling and Problem Solving
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Inverse reinforcemnet learing based on behaviors of a learning agent
Sub Title (in English)
Keyword(1) reinforcement learning
Keyword(2) inverse reinforcement learning
Keyword(3) apprenticeship learning
Keyword(4) learning process
1st Author's Name Shunsuke Sakurai
1st Author's Affiliation Kyoto University(Kyoto Univ.)
2nd Author's Name Shigeyuki Oba
2nd Author's Affiliation Kyoto University(Kyoto Univ.)
3rd Author's Name Shin Ishii
3rd Author's Affiliation Kyoto University(Kyoto Univ.)
Date 2015-06-23
Paper # IBISML2015-15
Volume (vol) vol.115
Number (no) IBISML-112
Page pp.pp.95-99(IBISML),
#Pages 5
Date of Issue 2015-06-16 (IBISML)