Presentation | 2015-06-23 Corpus and Topic Scalable Topic Model Soma Yokoi, Issei Sato, Hiroshi Nakagawa, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | It is known that topic model with high dimensional topics improves IR performance like search engines and online advertisements, because it helps to model long-tail words in large scale corpora. However, high dimensional topics with large corpora cause 2 problems: computational performance and memory requirement. For the fundamental topic model, LDA, SGRLD LDA is proposed to scale to large corpora and AliasLDA to accelerate computing topics. In this paper, we propose a method for both topic computation and data scalability, by combining these techniques. Also careful calculation of gradients reduces required space to expectations. Experiments demonstrate that our method is scalable for both corpus size and topic dimension, also achieve faster runtime speed compared to the existing approach, especially 10+ times faster on high dimensional topics setting. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | topic modeling / Langevin MCMC / alias method / scalability |
Paper # | IBISML2015-5 |
Date of Issue | 2015-06-16 (IBISML) |
Conference Information | |
Committee | NC / IPSJ-BIO / IBISML / IPSJ-MPS |
---|---|
Conference Date | 2015/6/23(3days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | Okinawa Institute of Science and Technology |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | Machine Learning Approach to Biodata Mining, and General |
Chair | Toshimichi Saito(Hosei Univ.) / Masakazu Sekijima(東工大) / Takashi Washio(Osaka Univ.) / Hayaru Shouno(電通大) |
Vice Chair | Shigeo Sato(Tohoku Univ.) / / Kenji Fukumizu(ISM) / Masashi Sugiyama(Tokyo Inst. of Tech.) |
Secretary | Shigeo Sato(Kyushu Inst. of Tech.) / (Kyoto Sangyo Univ.) / Kenji Fukumizu(京大) / Masashi Sugiyama(お茶の水女子大) / (OIST) |
Assistant | Hiroyuki Kanbara(Tokyo Inst. of Tech.) / Hisanao Akima(Tohoku Univ.) / / Koji Tsuda(Univ. of Tokyo) / Hisashi Kashima(Kyoto Univ.) |
Paper Information | |
Registration To | Technical Committee on Neurocomputing / Special Interest Group on Bioinformatics and Genomics / Technical Committee on Infomation-Based Induction Sciences and Machine Learning / Special Interest Group on Mathematical Modeling and Problem Solving |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Corpus and Topic Scalable Topic Model |
Sub Title (in English) | |
Keyword(1) | topic modeling |
Keyword(2) | Langevin MCMC |
Keyword(3) | alias method |
Keyword(4) | scalability |
1st Author's Name | Soma Yokoi |
1st Author's Affiliation | The University of Tokyo(UTokyo) |
2nd Author's Name | Issei Sato |
2nd Author's Affiliation | The University of Tokyo(UTokyo) |
3rd Author's Name | Hiroshi Nakagawa |
3rd Author's Affiliation | The University of Tokyo(UTokyo) |
Date | 2015-06-23 |
Paper # | IBISML2015-5 |
Volume (vol) | vol.115 |
Number (no) | IBISML-112 |
Page | pp.pp.27-31(IBISML), |
#Pages | 5 |
Date of Issue | 2015-06-16 (IBISML) |