多腕バンディット問題における経験モーメントを用いた戦略(テーマセッション,コンピュータビジョンとパターン認識のための機械学習)

Presentation	2011-09-05 Multiarmed Bandit Algorithms Based on Empirical Moments Junya HONDA, Akimichi TAKEMURA,
PDF Download Page	PDF download Page Link
Abstract(in Japanese)	(See Japanese page)
Abstract(in English)	In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a known bounded interval, e.g. [0,1]. For this model, there exists a policy which achieves the theoretical bound asymptotically. However the optimal policy requires a computation of a convex optimization which involves the empirical distribution of each arm. In this paper, we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. The asymptotic upper bound of the regret of the policy approaches the theoretical bound as d increases. The proposed policy requires a minimization of KL divergence with moment constraints. We show by the theory of Tchebycheff system that the optimal value is obtained by solving polynomial equations.
Keyword(in Japanese)	(See Japanese page)
Keyword(in English)	multiarmed bandit problem / Tchebycheff system / moment space / divergence minimization
Paper #	PRMU2011-60,IBISML2011-19
Date of Issue

Paper Information
Registration To	Information-Based Induction Sciences and Machine Learning (IBISML)
Language	ENG
Title (in Japanese)	(See Japanese page)
Sub Title (in Japanese)	(See Japanese page)
Title (in English)	Multiarmed Bandit Algorithms Based on Empirical Moments
Sub Title (in English)
Keyword(1)	multiarmed bandit problem
Keyword(2)	Tchebycheff system
Keyword(3)	moment space
Keyword(4)	divergence minimization
1st Author's Name	Junya HONDA
1st Author's Affiliation	Graduate School of Frontier Sciences, The University of Tokyo()
2nd Author's Name	Akimichi TAKEMURA
2nd Author's Affiliation	Graduate School of Information Science and Technology, The University of Tokyo
Date	2011-09-05
Paper #	PRMU2011-60,IBISML2011-19
Volume (vol)	vol.111
Number (no)	194
Page	pp.pp.-
#Pages	8
Date of Issue