マルチエージェントカードゲームのための信念状態強化学習法

Presentation	2004/3/12 A belief-state reinforcement learning scheme for a multi-agent card game Hajime FUJITA, Shin ISHII,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In this report, we deal with the card game "Hearts", an instance of decision making problems in partially observable situations. We present the state estimation method based on a sampling method, and a reinforcement learning (RL) scheme for a multi-agent environment using the state estimation method. Since there are often a lot of unobservable cards in this game, RL is dealt with in the framework of a partially observable Markov decision process (POMDP). Using pessimistic observations, the learning agent focuses on an important domain of a large state space, estimates unobservable states based on a sampling method from such a subspace, and makes a decision by predicting the environmental behavior. Simulation results show that our model-based POMDP-RL method with sampling state estimation is applicable to this realistic multi-agent problem.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	sampling method / POMDP / estimation of unobservable state variables / multi-agent / reinforcement learning (RL)
Paper #	NC2003-205
Date of Issue

Paper Information
Registration To	Neurocomputing (NC)
Language	JPN
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	A belief-state reinforcement learning scheme for a multi-agent card game
Sub Title (in English)
Keyword(1)	sampling method
Keyword(2)	POMDP
Keyword(3)	estimation of unobservable state variables
Keyword(4)	multi-agent
Keyword(5)	reinforcement learning (RL)
1st Author's Name	Hajime FUJITA
1st Author's Affiliation	Nara Institute of Science and Technology()
2nd Author's Name	Shin ISHII
2nd Author's Affiliation	CREST, Japan Science and Technology Agency
Date	2004/3/12
Paper #	NC2003-205
Volume (vol)	vol.103
Number (no)	734
Page	pp.pp.-
#Pages	6
Date of Issue