Presentation 2012-03-12
Kernel Bellman Equations in POMDPs
Yu NISHIYAMA, Abdeslam BOULARIAS, Arthur GRETTON, Kenji FUKUMIZU,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) We propose to handle POMDPs in reproducing kernel Hilbert spaces (RKHSs) using recent kernel methods of embedding distributions in an RKHS and kernel Bayes rule (KBR). We embed Bellman equations to equations over an RKHS on the set of states and define value functions as functions over the RKHS. We then learn a policy as a mapping from elements of the RKHS to actions. We give empirical representations of the Bellman equations and use it for value iteration algorithms with the kernel Bayes rule. In experiments, we demonstrate the kernel value iteration with finite horizons on some benchmarks in POMDPs and show that the policy converged to the policy that the exact model computed. We also propose QMDP approximations with this kernel methods for the setting of initial values and for pruning action edges to make computationally efficient.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) POMDP / RKHS / Distribution Embeddings / Value Iteration / QMDP
Paper # IBISML2011-92
Date of Issue

Conference Information
Committee IBISML
Conference Date 2012/3/5(1days)
Place (in Japanese) (See Japanese page)
Place (in English)
Topics (in Japanese) (See Japanese page)
Topics (in English)
Chair
Vice Chair
Secretary
Assistant

Paper Information
Registration To Information-Based Induction Sciences and Machine Learning (IBISML)
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) Kernel Bellman Equations in POMDPs
Sub Title (in English)
Keyword(1) POMDP
Keyword(2) RKHS
Keyword(3) Distribution Embeddings
Keyword(4) Value Iteration
Keyword(5) QMDP
1st Author's Name Yu NISHIYAMA
1st Author's Affiliation Center for Statistical Machine Learning Research, The Institute of Statistical Mathematics()
2nd Author's Name Abdeslam BOULARIAS
2nd Author's Affiliation Max Planck Institute for Intelligent Systems
3rd Author's Name Arthur GRETTON
3rd Author's Affiliation Gatsby Unit, UCL:Max Planck Institute for Intelligent Systems
4th Author's Name Kenji FUKUMIZU
4th Author's Affiliation Center for Statistical Machine Learning Research, The Institute of Statistical Mathematics
Date 2012-03-12
Paper # IBISML2011-92
Volume (vol) vol.111
Number (no) 480
Page pp.pp.-
#Pages 8
Date of Issue