マルチエージェントカードゲームのための信念状態強化学習法

藤田 肇; 石井 信

講演名	2004/3/12 マルチエージェントカードゲームのための信念状態強化学習法藤田肇, 石井信,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	本報告では,部分観測環境を成す実問題であるカードゲームHeartsを研究用タスクとして扱い,そこでの非観測状態の推定のために,サンプリング法を用いた手法を提案する.また本状態推定法を取り入れたマルチエージェント系強化学習法を提案する.Heartsでは非観測なカードが多く存在するため,本手法では部分観測マルコフ決定過程(POMDP)として定式化している.学習エージェントは悲観的観測空間に着目することにより,広大な状態空間から重要な部分空間のみを切り出した上でサンプリング法に基づいて非観測状態を推定し,環境のダイナミクスを予測することで自身の行動を決定する.本手法がHeartsの強化学習問題に対して有効であることを,計算機シミュレーションにより示す.
抄録(英)	In this report, we deal with the card game "Hearts", an instance of decision making problems in partially observable situations. We present the state estimation method based on a sampling method, and a reinforcement learning (RL) scheme for a multi-agent environment using the state estimation method. Since there are often a lot of unobservable cards in this game, RL is dealt with in the framework of a partially observable Markov decision process (POMDP). Using pessimistic observations, the learning agent focuses on an important domain of a large state space, estimates unobservable states based on a sampling method from such a subspace, and makes a decision by predicting the environmental behavior. Simulation results show that our model-based POMDP-RL method with sampling state estimation is applicable to this realistic multi-agent problem.
キーワード(和)	サンプリング法 / 部分観測問題 / 非観測状態の推定 / マルチエージェント系 / 強化学習
キーワード(英)	sampling method / POMDP / estimation of unobservable state variables / multi-agent / reinforcement learning (RL)
資料番号	NC2003-205
発行日

研究会情報
研究会	NC
開催期間	2004/3/12(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Neurocomputing (NC)
本文の言語	JPN
タイトル（和）	マルチエージェントカードゲームのための信念状態強化学習法
サブタイトル（和）
タイトル（英）	A belief-state reinforcement learning scheme for a multi-agent card game
サブタイトル（和）
キーワード(1)（和/英）	サンプリング法 / sampling method
キーワード(2)（和/英）	部分観測問題 / POMDP
キーワード(3)（和/英）	非観測状態の推定 / estimation of unobservable state variables
キーワード(4)（和/英）	マルチエージェント系 / multi-agent
キーワード(5)（和/英）	強化学習 / reinforcement learning (RL)
第 1 著者氏名（和/英）	藤田肇 / Hajime FUJITA
第 1 著者所属（和/英）	奈良先端科学技術大学院大学 Nara Institute of Science and Technology
第 2 著者氏名（和/英）	石井信 / Shin ISHII
第 2 著者所属（和/英）	科学技術振興事業団 CREST CREST, Japan Science and Technology Agency
発表年月日	2004/3/12
資料番号	NC2003-205
巻番号（vol）	vol.103
号番号（no）	734
ページ範囲	pp.-
ページ数	6
発行日