PPOにおける対称な制約条件の適用による学習性能の改善

Presentation	2022-06-09 Improvement of Learning Performance by Using a Symmetric Constraint Condition in PPO Naoki Iwaya, Hidehiro Nakano,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	Deep Reinforcement Learning (DRL) is an algorithm of learning the optimal action from the experiences. PPO KL Penalty, a kind of DRL, features suppressing the large update values by KL constraint and preventing wrong recognition, and can save the learning time. However, PPO KL Penalty is unstable because KL divergence has asymmetrical characteristics. This research aims to apply symmetrical constraint to increase learning stability and efficiency.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	Deep Reinforcement Learning / Policy gradient method / PPO
Paper #	NLP2022-3,CCS2022-3
Date of Issue	2022-06-02 (NLP, CCS)

Conference Information
Committee	CCS / NLP
Conference Date	2022/6/9(2days)
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Chair	Megumi Akai(Hokkaido Univ.) / Akio Tsuneda(Kumamoto Univ.)
Vice Chair	Masaki Aida(TMU) / Hidehiro Nakano(Tokyo City Univ.) / Hiroyuki Torikai(Hosei Univ.)
Secretary	Masaki Aida(TDK) / Hidehiro Nakano(Shibaura Insti. of Tech.) / Hiroyuki Torikai(Sojo Univ.)
Assistant	Tomoyuki Sasaki(Shonan Instit. of Tech.) / Hiroyasu Ando(Tsukuba Univ.) / Miki Kobayashi(Rissho Univ.) / " Hiroyuki YASUDA(The Univ. of Tokyo) / Yuichi Yokoi(Nagasaki Univ.) / Yoshikazu Yamanaka(Utsunomiya Univ.)

Paper Information
Registration To	Technical Committee on Complex Communication Sciences / Technical Committee on Nonlinear Problems
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Improvement of Learning Performance by Using a Symmetric Constraint Condition in PPO
Sub Title (in English)
Keyword(1)	Deep Reinforcement Learning
Keyword(2)	Policy gradient method
Keyword(3)	PPO
1st Author's Name	Naoki Iwaya
1st Author's Affiliation	Tokyo City University(Tokyo City Univ.)
2nd Author's Name	Hidehiro Nakano
2nd Author's Affiliation	Tokyo City University(Tokyo City Univ.)
Date	2022-06-09
Paper #	NLP2022-3,CCS2022-3
Volume (vol)	vol.122
Number (no)	NLP-65,CCS-66
Page	pp.pp.13-16(NLP), pp.13-16(CCS),
#Pages	4
Date of Issue	2022-06-02 (NLP, CCS)