The Policy Gradient On The Average Reward Manifold

Matsubara,Takamitsu; Morimoto,Jun

IEICE Technical Committee Submission System
Conference Paper's Information

Online Proceedings
[Sign in]
Tech. Rep. Archives

Paper Abstract and Keywords
Presentation		2007-12-22 16:10 The Policy Gradient On The Average Reward Manifold Takamitsu Matsubara (NAIST/ATR), Jun Morimoto (JST/ATR) NC2007-85
Abstract	(in Japanese)	(See Japanese page)
	(in English)	In this paper we propose a novel policy gradient type reinforcement learning method on the average rewardmanifold, in which a metric to measure the effect of change in policy parameters on the average reward is introduced. In our method, the derivative of the average reward with respect to the policy improvement can be fixed as a constant. Moreover, around a (sub-) optimal policy, the policy gradient method is equivalent to the Newton method. Simple simulation results with comparison to previously proposed natural policy gradient methods demonstrate the effectiveness of our policy gradient method.
Keyword	(in Japanese)	(See Japanese page)
	(in English)	Reinforcement Learning / Policy Gradient / Natural Policy Gradient / / / / /
Reference Info.		IEICE Tech. Rep., vol. 107, no. 410, NC2007-85, pp. 81-86, Dec. 2007.
Paper #		NC2007-85
Date of Issue		2007-12-15 (NC)
ISSN		Print edition: ISSN 0913-5685 Online edition: ISSN 2432-6380
Copyright and reproduction		All rights are reserved and no part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopy, recording, or any information storage and retrieval system, without permission in writing from the publisher. Notwithstanding, instructors are permitted to photocopy isolated articles for noncommercial classroom use without fee. (License No.: 10GA0019/12GB0052/13GB0056/17GB0034/18GB0034)
Download PDF		NC2007-85

Conference Information
Committee	MBE NC
Conference Date	2007-12-22 - 2007-12-22
Place (in Japanese)	(See Japanese page)
Place (in English)
Topics (in Japanese)	(See Japanese page)
Topics (in English)
Paper Information
Registration To	NC
Conference Code	2007-12-MBE-NC
Language	English (Japanese title is available)
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	The Policy Gradient On The Average Reward Manifold
Sub Title (in English)
Keyword(1)	Reinforcement Learning
Keyword(2)	Policy Gradient
Keyword(3)	Natural Policy Gradient
Keyword(4)
Keyword(5)
Keyword(6)
Keyword(7)
Keyword(8)
1st Author's Name	Takamitsu Matsubara
1st Author's Affiliation	Nara Institute of Science and Technology, ATR-CNS (NAIST/ATR)
2nd Author's Name	Jun Morimoto
2nd Author's Affiliation	JST-ICORP, ATR-CNS (JST/ATR)
3rd Author's Name
3rd Author's Affiliation	()
4th Author's Name
4th Author's Affiliation	()
5th Author's Name
5th Author's Affiliation	()
6th Author's Name
6th Author's Affiliation	()
7th Author's Name
7th Author's Affiliation	()
8th Author's Name
8th Author's Affiliation	()
9th Author's Name
9th Author's Affiliation	()
10th Author's Name
10th Author's Affiliation	()
11th Author's Name
11th Author's Affiliation	()
12th Author's Name
12th Author's Affiliation	()
13th Author's Name
13th Author's Affiliation	()
14th Author's Name
14th Author's Affiliation	()
15th Author's Name
15th Author's Affiliation	()
16th Author's Name
16th Author's Affiliation	()
17th Author's Name
17th Author's Affiliation	()
18th Author's Name
18th Author's Affiliation	()
19th Author's Name
19th Author's Affiliation	()
20th Author's Name
20th Author's Affiliation	()
Speaker	Author-1
Date Time	2007-12-22 16:10:00
Presentation Time	25 minutes
Registration for	NC
Paper #	NC2007-85
Volume (vol)	vol.107
Number (no)	no.410
Page	pp.81-86
#Pages	6
Date of Issue	2007-12-15 (NC)

[Return to Top Page]

[Return to IEICE Web Page]

The Institute of Electronics, Information and Communication Engineers (IEICE), Japan