Actor-critic 法における共分散を考慮した多次元正規分布による政策表現(一般(プランニングと意思決定), 「社会システムにおける知能」及び一般)

阿部 哲; 上野 敦志; 木戸出 正継

講演名	2005/3/7 Actor-critic 法における共分散を考慮した多次元正規分布による政策表現(一般(プランニングと意思決定), 「社会システムにおける知能」及び一般) 阿部哲, 上野敦志, 木戸出正継,
PDFダウンロードページ	PDFダウンロードページへ
抄録(和)	実世界中での行動学習問題は, 入力である状態空間と出力である行動空間が共に連続空間である場合が多い.強化学習の一種であるactor-critic法は, 連続状態行動空間を扱う問題にも適用が可能で, いくつかの研究が行われてきた.連続行動空間を扱う場合, 行動を選択する確率分布(政策)として一般に正規分布を用いる.エージェントは, 環境との相互作用を通じて, 適切な行動を選択できるように, 正規分布の平均や標準偏差を調節する.従来手法は簡単化のために, 各次元毎に独立した正規分布を用いる.しかしマニピュレータの軌道計画問題やロボットの歩行制御問題などの実問題は, 各出力が協調して動かなければならない.従来手法は出力間の相関関係を考慮できないため, 協調行動の学習が困難となったり, 学習に時間がかかったりする問題が考えられる.そこで本稿では, 共分散を考慮した多次元正規分布を政策表現に用いたactor-critic法を提案し, 学習の高速化と性能向上を目指す.本手法の有効性を検証するために, マニピュレータの軌道計画問題を取り上げる.
抄録(英)	Actor-critic methods, which is one of reinforcement learning methods, is applied to that problems easily, and has left many achievements. Generaly, normal distribution has been used as probability distribution on which agent selects action. Agent renews means and standard deviation through policy parameter for selecting appropriate action intercting with environment. Under assumption that output dimensions are individual, conventional methods use normal distribution. Problems, such as trajectory planning of manupulator, and robot walking control etc., every output must cooperate with each other. Conventional methods cannot make consideration correlation, so it takes long time to get policy selecting action cooperately and being high performance. In this paper, we aim that learning speed up and improvement performance by adopting multivariate normal distribution with variance and covariance matrix into probability distribution selecting action. we have some experiments to demonstrate availability of this method by trajectory planning of manipulator.
キーワード(和)	強化学習 / actor-critic法 / 多次元正規分布 / マニピュレータ
キーワード(英)	reinforcement learning / actor-critic methods / multidimensional normal distribution / manipulator
資料番号	AI2004-72
発行日

研究会情報
研究会	AI
開催期間	2005/3/7(から1日開催)
開催地（和）
開催地（英）
テーマ（和）
テーマ（英）
委員長氏名（和）
委員長氏名（英）
副委員長氏名（和）
副委員長氏名（英）
幹事氏名（和）
幹事氏名（英）
幹事補佐氏名（和）
幹事補佐氏名（英）

講演論文情報詳細
申込み研究会	Artificial Intelligence and Knowledge-Based Processing (AI)
本文の言語	JPN
タイトル（和）	Actor-critic 法における共分散を考慮した多次元正規分布による政策表現(一般(プランニングと意思決定), 「社会システムにおける知能」及び一般)
サブタイトル（和）
タイトル（英）	Stochastic Policy Representation Using a Multidimensional Normal Distribution for Actor-critic Methods
サブタイトル（和）
キーワード(1)（和/英）	強化学習 / reinforcement learning
キーワード(2)（和/英）	actor-critic法 / actor-critic methods
キーワード(3)（和/英）	多次元正規分布 / multidimensional normal distribution
キーワード(4)（和/英）	マニピュレータ / manipulator
第 1 著者氏名（和/英）	阿部哲 / Satoshi ABE
第 1 著者所属（和/英）	奈良先端科学技術大学院大学情報科学研究科 Nara Institute of Science and Technology
第 2 著者氏名（和/英）	上野敦志 / Atsushi UENO
第 2 著者所属（和/英）	大阪市立大学大学院工学研究科 Osaka City University
第 3 著者氏名（和/英）	木戸出正継 / Masatsugu KIDODE
第 3 著者所属（和/英）	奈良先端科学技術大学院大学情報科学研究科 Nara Institute of Science and Technology
発表年月日	2005/3/7
資料番号	AI2004-72
巻番号（vol）	vol.104
号番号（no）	726
ページ範囲	pp.-
ページ数	6
発行日