Presentation 2011-09-05
Multiarmed Bandit Algorithms Based on Empirical Moments
Junya HONDA, Akimichi TAKEMURA,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) In the multiarmed bandit problem a gambler chooses an arm of a slot machine to pull considering a tradeoff between exploration and exploitation. We study the stochastic bandit problem where each arm has a reward distribution supported in a known bounded interval, e.g. [0,1]. For this model, there exists a policy which achieves the theoretical bound asymptotically. However the optimal policy requires a computation of a convex optimization which involves the empirical distribution of each arm. In this paper, we propose a policy which exploits the first d empirical moments for arbitrary d fixed in advance. The asymptotic upper bound of the regret of the policy approaches the theoretical bound as d increases. The proposed policy requires a minimization of KL divergence with moment constraints. We show by the theory of Tchebycheff system that the optimal value is obtained by solving polynomial equations.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) multiarmed bandit problem / Tchebycheff system / moment space / divergence minimization
Paper # PRMU2011-60,IBISML2011-19
Date of Issue

Conference Information
Committee IBISML
Conference Date 2011/8/29(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Information-Based Induction Sciences and Machine Learning (IBISML)
Language ENG
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Multiarmed Bandit Algorithms Based on Empirical Moments
Sub Title (in English)
Keyword(1) multiarmed bandit problem
Keyword(2) Tchebycheff system
Keyword(3) moment space
Keyword(4) divergence minimization
1st Author's Name Junya HONDA
1st Author's Affiliation Graduate School of Frontier Sciences, The University of Tokyo()
2nd Author's Name Akimichi TAKEMURA
2nd Author's Affiliation Graduate School of Information Science and Technology, The University of Tokyo
Date 2011-09-05
Paper # PRMU2011-60,IBISML2011-19
Volume (vol) vol.111
Number (no) 194
Page pp.pp.-
#Pages 8
Date of Issue