Presentation 2016-07-06
A New Probabilistic Topic Model Based on Variable Bin Width Histogram
Hideaki Kim, Tomoharu Iwata, Hiroshi Sawada,
PDF Download Page PDF download Page Link
Abstract(in Japanese) (See Japanese page)
Abstract(in English) Probabilistic topic models, as represented by latent Dirichlet allocation (LDA), have been widely used for analyzing not only categorical but also continuous data such as times of word appearance and price information. In the topic model for continuous data, however, the component distributions needs to be simple exponential families like normal distributions to perform the efficient parameter estimation, which limits the representative power of the model. In this paper, by incorporating the nonparametric histogram density estimator into the topic model, we construct a new probabilistic topic model to overcome the limitation. The estimation of the parameters, including the bin width selection, is performed by using efficient collapsed Gibbs sampling. We derive the estimation algorithms for the regular and variable bin width scenarios. We apply the proposed method to synthetic data, confirming that it performs well.
Keyword(in Japanese) (See Japanese page)
Keyword(in English) LDA / topic model / histogram / bin width selection
Paper # IBISML2016-6
Date of Issue 2016-06-28 (IBISML)

Conference Information
Committee NC / IPSJ-BIO / IBISML / IPSJ-MPS
Conference Date 2016/7/4(3days)
Place (in Japanese) (See Japanese page)
Place (in English) Okinawa Institute of Science and Technology
Topics (in Japanese) (See Japanese page)
Topics (in English) Machine Learning Approach to Biodata Mining, and General
Chair Shigeo Sato(Tohoku Univ.) / / Kenji Fukumizu(ISM)
Vice Chair Masafumi Hagiwara(Keio Univ.) / / Masashi Sugiyama(Univ. of Tokyo) / Hisashi Kashima(Kyoto Univ.)
Secretary Masafumi Hagiwara(Kyoto Sangyo Univ.) / (Tokyo Inst. of Tech.) / Masashi Sugiyama / Hisashi Kashima(Univ. of Tokyo) / (Nagoya Inst. of Tech.)
Assistant Hisanao Akima(Tohoku Univ.) / Yoshihisa Shinozawa(Keio Univ.) / / Toshihiro Kamishima(AIST) / Tomoharu Iwata(NTT)

Paper Information
Registration To Technical Committee on Neurocomputing / Special Interest Group on Bioinformatics and Genomics / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Mathematical Modeling and Problem Solving
Language JPN
Title (in Japanese) (See Japanese page)
Sub Title (in Japanese) (See Japanese page)
Title (in English) A New Probabilistic Topic Model Based on Variable Bin Width Histogram
Sub Title (in English)
Keyword(1) LDA
Keyword(2) topic model
Keyword(3) histogram
Keyword(4) bin width selection
1st Author's Name Hideaki Kim
1st Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
2nd Author's Name Tomoharu Iwata
2nd Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
3rd Author's Name Hiroshi Sawada
3rd Author's Affiliation Nippon Telegraph and Telephone Corporation(NTT)
Date 2016-07-06
Paper # IBISML2016-6
Volume (vol) vol.116
Number (no) IBISML-121
Page pp.pp.217-223(IBISML),
#Pages 7
Date of Issue 2016-06-28 (IBISML)