Presentation | 2022-06-09 Improvement of Learning Performance by Using a Symmetric Constraint Condition in PPO Naoki Iwaya, Hidehiro Nakano, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Deep Reinforcement Learning (DRL) is an algorithm of learning the optimal action from the experiences. PPO KL Penalty, a kind of DRL, features suppressing the large update values by KL constraint and preventing wrong recognition, and can save the learning time. However, PPO KL Penalty is unstable because KL divergence has asymmetrical characteristics. This research aims to apply symmetrical constraint to increase learning stability and efficiency. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Deep Reinforcement Learning / Policy gradient method / PPO |
Paper # | NLP2022-3,CCS2022-3 |
Date of Issue | 2022-06-02 (NLP, CCS) |
Conference Information | |
Committee | CCS / NLP |
---|---|
Conference Date | 2022/6/9(2days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | Megumi Akai(Hokkaido Univ.) / Akio Tsuneda(Kumamoto Univ.) |
Vice Chair | Masaki Aida(TMU) / Hidehiro Nakano(Tokyo City Univ.) / Hiroyuki Torikai(Hosei Univ.) |
Secretary | Masaki Aida(TDK) / Hidehiro Nakano(Shibaura Insti. of Tech.) / Hiroyuki Torikai(Sojo Univ.) |
Assistant | Tomoyuki Sasaki(Shonan Instit. of Tech.) / Hiroyasu Ando(Tsukuba Univ.) / Miki Kobayashi(Rissho Univ.) / " Hiroyuki YASUDA(The Univ. of Tokyo) / Yuichi Yokoi(Nagasaki Univ.) / Yoshikazu Yamanaka(Utsunomiya Univ.) |
Paper Information | |
Registration To | Technical Committee on Complex Communication Sciences / Technical Committee on Nonlinear Problems |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Improvement of Learning Performance by Using a Symmetric Constraint Condition in PPO |
Sub Title (in English) | |
Keyword(1) | Deep Reinforcement Learning |
Keyword(2) | Policy gradient method |
Keyword(3) | PPO |
1st Author's Name | Naoki Iwaya |
1st Author's Affiliation | Tokyo City University(Tokyo City Univ.) |
2nd Author's Name | Hidehiro Nakano |
2nd Author's Affiliation | Tokyo City University(Tokyo City Univ.) |
Date | 2022-06-09 |
Paper # | NLP2022-3,CCS2022-3 |
Volume (vol) | vol.122 |
Number (no) | NLP-65,CCS-66 |
Page | pp.pp.13-16(NLP), pp.13-16(CCS), |
#Pages | 4 |
Date of Issue | 2022-06-02 (NLP, CCS) |