Presentation | 2014-06-21 Determining the Number of Topics for LDA Method and Evaluating Extracted Topics : With an Application to Twitter Streaming Data Iwao FUJINO, Yuko HOSHINO, |
---|---|
PDF Download Page | PDF download Page Link |
Abstract(in Japanese) | (See Japanese page) |
Abstract(in English) | Topic model is an emerging approach to summarize data, especially text data, in terms of a small set of latent variables. The most useful implement of topic model is LDA method, which is an unsupervised machine learning technique to identify latent topic information from a massive document collection. However, sometimes the LDA method gives some hard-understanding or meaningless results. In order to improve this problem, in this paper we proposed a method for refining result of LDA and also ranking topics in order of some significant criterion. Our study is based on two assumptions. The first assumption is that the correlation coefficient between any two different topics should be zero under ideal condition. The second assumption is that the quality of topics can be defined as a deviation from usual word distribution. Starting from these two assumptions, we provided a concrete method to determine the number of topics when using LDA method to extract topics from documents data and also to ranking the LDA results in order of quality. As a confirmation of our proposed methods, we conducted some experiments to processing Twitter streaming data. The results of these experiments show that our methods work efficiently as expected. |
Keyword(in Japanese) | (See Japanese page) |
Keyword(in English) | Topic model / LDA(Latent Dirichlet Allocation) / Correlation coefficient / JS divergence / Twitter |
Paper # | DE2014-16 |
Date of Issue |
Conference Information | |
Committee | DE |
---|---|
Conference Date | 2014/6/14(1days) |
Place (in Japanese) | (See Japanese page) |
Place (in English) | |
Topics (in Japanese) | (See Japanese page) |
Topics (in English) | |
Chair | |
Vice Chair | |
Secretary | |
Assistant |
Paper Information | |
Registration To | Data Engineering (DE) |
---|---|
Language | JPN |
Title (in Japanese) | (See Japanese page) |
Sub Title (in Japanese) | (See Japanese page) |
Title (in English) | Determining the Number of Topics for LDA Method and Evaluating Extracted Topics : With an Application to Twitter Streaming Data |
Sub Title (in English) | |
Keyword(1) | Topic model |
Keyword(2) | LDA(Latent Dirichlet Allocation) |
Keyword(3) | Correlation coefficient |
Keyword(4) | JS divergence |
Keyword(5) | |
1st Author's Name | Iwao FUJINO |
1st Author's Affiliation | School of information and telecommunication engineering, Tokai University() |
2nd Author's Name | Yuko HOSHINO |
2nd Author's Affiliation | School of information and telecommunication engineering, Tokai University |
Date | 2014-06-21 |
Paper # | DE2014-16 |
Volume (vol) | vol.114 |
Number (no) | 101 |
Page | pp.pp.- |
#Pages | 6 |
Date of Issue |